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I.   INTRODUCTION. 


The  Conference  on  Data  Systems  Language  (CODASYL)  has 
defined  a  data  base  system  TRef.  2  and  3]  which  is  partially 
generalized  and  Dartially  tailored  to  COBOL.  This  system  is 
based  on  network  data  model ina  techniques.  CODASYL  has  made 
the  claim  that  the  system  could  have  other  1 anquages  effec- 
tively interfaced  to  it  and  that  the  system  could  be  imple- 
mented in  a  variety  of  environments.  The  Computer  Science 
Department  has  two  Digital  Equipment  Corporation  PDP  11/50 
computers  running  with  the  UNIX  oDerating  system  (Ref.  1). 
The  eauipment  was  acguired  for  research  in  signal  processing 
aDDlications.  This  environment  is  one  in  which  a  CODASYL 
based  data  base  management  system  has  never  been  introduced. 

Relational  data  modeling  technigues  are  the  major  com- 
petitor with  the  CODASYL  system.  Recently*  development  was 
completed  on  a  relational  data  base  management  system  which 
runs  under  UNIX  [Ref.  4  and  51.  This  system  is  called 
INGRESS  and  was  developed  at  the  University  of  California  at 
Berkeley.  Currently  a  discussion  is  taking  place  in  the 
literature  over  the  relative  merits  and  drawbacks  of  rela- 
tional models  versus  network  models  (chiefly  the  CODASYL 
version)  [Refs.  b,  1,  *  and  9],  Although  much  has  been 
written  about  the  merits  of  each  model >  relatively  little 
empirical  comparison  has  been  done.   Therefore/  since   steps 
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are  being  taken  to  aquire  the  INGRESS  software*  it  was 
decided  that  a  CODASYL  data  management  system  would  provide 
a  complete  suite  of  data  manaaement  software. 

The  tasks  to  be  accomplished  were  desiqn  and  implemen- 
tation of  a  UNIX  hosted  CODASYL  system,  design  and  implemen- 
tation of  a  C  1 anquage  IRef.  10]  interface  to  this  system, 
aquisition  of  INGRESS  and  comparative  studies  of  the  two 
systems  for  signal  processing  applications.  This  thesis 
documents  the  design  and  implementation  of  a  UNIX  hosted 
CODASYL  data  base  management  system  and  the  design  of  a  C 
language  interface  to  this  system. 


1  1 


II.   BACKGROUND. 


A.   Data  Access  Methods  -  A  History. 

1.   Technological  Effects. 

During  the  first  and  second  generations  of  computer 
hardware*  data  storage  media  were  tapes*  relatively  slow 
disks  and  drums  and  the  omnipresent  punched  card.  Data 
storage  and  retrieval  conceots  were  shaped  by  these  devices* 
especially  the  punched  card.  A  file  was  therefore  a  seauen- 
tially  ordered  and  accessed*  contiauously  stored  group  of 
records.  All  the  records  were  of  fixed  length.  This  view 
is  also  oriented  toward  a  monoproaramm i ng  environment  with 
absolute  separation  of  one  user's  files  from  another's. 

With  the  advent  of  third  generation  technolooy* 
several  factors  began  to  affect  data  storage  and  retrieval 
conceots.  Foremost  was  the  development  of  fast*  high  caoa- 
city*  relatively  inexpensive  direct  access  storage  devices. 
These  devices  stimulated  the  development  of  a  whole  ranae  of 
new  access  techniques  such  as  hash  coded  and  indexed  file 
organization.  Secondly*  the  multi-user  environment  caused  a 
breakdown  of  the  sharo  division  between  the  execution 
environment  of  users.  This  breakdown  was  accompanied  by  a 
rethinking  of  the  relationships  between  the  overlapping  data 
reauirements   of   users.    All   these   factors   led   to   the 
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development   of   a   new   kind  of  file  system  known  as  a  data 
base . 

2.       Access  Methods. 

The  inspiration  for  a  mu 1 t i -puPDose  data  base  came 
from  management  systems  in  which  it  was  discovered  that  vast 
overlap  and  duplication  of  data  was  occuring  between  dif- 
ferent grouos  in  a  company.  For  examde  the  payroll  and 
personnel  sections  would  typically  each  have  emoloyee  files 
which  were  stored  and  maintained  separately  but  which  over- 
lapped by  80  oer  cent  in  data  content.  An  early  management 
information  system  (MIS)  which  attacked  this  oroblem  was 
IBM's  Bill  of  Material  Program  CBOMP)  [Ref.  121  which 
allowed  the  structurinq  of  a  oarts  list  with  subassemblies 
each  having  its  own  parts  lists;  this  facilitated  the 
management  of  manu f ac t ur i na  inventories.  When  the  subassem- 
blies occured  in  many  different  Darts*  the  savings  afforded 
through  avoided  data  duplication  were  significant.  The  BOMP 
used  a  relatively  flexible  list  structure  and  marked  a  sig- 
nificant deoarture  from  traditional  file  organization. 

With  the  advent  of  the  consolidated  multi-purpose 
data  base*  a  whole  new  level  of  data  structurinq  was  imoosed 
on  the  techniaues  for  physically  maooinq  files  to  devices. 
These  data  structures  emoloyed  relatively  complex  methods 
from  graph  theory  and  other  disiolines  which  had  previously 
been  used  only  on  relatively  small  amounts  of  data  residing 
in  main  s t o r aae . 
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This  new  level  of  data  base  structure  combined  with 
the  fact  that  a  very  large  portion  of  the  data  base  is  typi- 
cally on-1 ine»  Has  imposed  on  programmers  the  reauirement 
for  a  new  skill.  This  skill  has  been  called  navigation 
through  the  data  base  [Ref.  131.  In  this  view,  a  programmer 
must  travel  via  access  paths  through  the  data  base  searching 
for  landmarks  until  he  has  located  the  data  he  desires. 
Choosing  an  inappropriate  access  path  can  be  extraordinarily 
inefficient  and  costly  in  time,  so  the  penalty  for  lack  of 
navigational  skill  is  high.  It  would  obviously  be  desirable 
to  remove  as  much  of  the  burden  for  navigation  from  the  pro- 
grammer as  is  practical.  The  develooement  of  modern  data 
base  management  svstems  (DBMS)  has  been  made  difficult  bv 
the  dilemma  of  desiring  both  optimal  access  paths  and  ease 
and  simplicity  of  use  for  the  programmer. 

3 .   Termi  nol ogy • 

This  section  will  attemot  to  define  the  terms  used 
in  this  field  of  innuirv.  T  H  e  following  definition  of  a  data 
base  is  due  to  Ref.  13: 

"A  data  base  may  be  defined  as  a  collection  of 
interrelated  data  stored  together  with  as  little 
redundancy  as  possible  to  serve  one  or  more  applica- 
tions in  an  optimal  fashion;  the  data  are  stored_  so 
that  they  are  independent  of  programs  which  use  the 
data?  a  common  and  controlled  approach  is  used  in 
adding  new  data  and  in  modifying  and  retrievina 
existina   data   within  the  data  base.   One  system  is 


14 


said  to  contain  a  collection  of  data  bases   if   they 
are  entirely  separate  in  structure." 

Two  t  v  p  e  s  of  languaoes  are  mentioned  in  connection 
with  DBMS.  The  first  is  the  Data  Description  Language  (DDL) 
which  describes  the  types  of  data  entities  which  may  exist 
along  with  the  allowable  attributes.  There  may  be  two  DDL  * s 
or  two  levels  of  DDL  for  describing  a  data  base.  The  first 
level  description  is  the  system's  view  of  the  data  base  as 
it  is  actually  organized  and  the  second*  a  user's  view  of 
the  data  base.  These  levels  are  called  the  schema  and  sub- 
subschema  respectively.  In  the  relational  model  terminol- 
ogy* the  DDL  may  be  called  the  relational  algebra. 

The  second  1 anguaae  is  the  Data  Manipulation 
Language  (DM|_)  which  is  concerned  with  the  storage* 
retrieval  and  modification  of  specific  occurences  of  the 
entity  types  described  by  DDL  statements.  In  relational 
model  terminology*  this  1 anauage  corresponds  to  the  rela- 
tional calculus.  The  entities  handled  by  DDL  and  DML  may  be 
records*  sets  or  anything  that  may  need  manipulation.  The 
attributes  may  be  such  thinas  as  data  items*  set  membership* 
set  ownership  or  location  within  the  data  base. 

The  data  base  model  is  the  «et a-st rue t ure  which  is 
imposed  on  the  organization  of  the  data  base.  The  model 
prescribes  the  tyres  of  entities  which  are  allowed.  It 
defines  the  data  attributes  and  structural  attributes  that 
an  entity  may  have.   The  definition  of  a    DDL  and  DML  is   the 
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implementation  of  the  met a-st rue t ures  of  a  data  base  model. 
Currently  the  two  most  widely  discussed  models  are  the  net- 
work model  and  the  relational  model, 

4.   Goals  of  a  DBMS. 

The  following  qoals  have  been  prooosed  for  a  DBMS 
[Ref.  3]  . 

-  Allow  the  data  structures  suited  to  each  particu- 
lar application  while  oermittinq  multiole  applications  to 
use  the  data  without  need  for  data  redundancy. 

Allow   more   than   one   Drocess   to   concurrently 
retrieve  or  uodate  data  in  the  data  base. 

-  Enable  the  use  of  a  variety  of  search  strategies 
against  an  entire  data  base  or  a  portion  of  it. 

-  Provide  protection  of  data  from  unauthorized 
access . 

-  Provide  centralized  control  over  the  placement  of 
data . 

-  Provide  device  independence  for  oroarams. 

-  Allow  the  user  to  interact  with  the  data  but  be 
free  of  the  mechanics  of  maintaining  the  structural  associa- 
tions which  have  been  declared. 

-  Allow  as  areat  an  independence  of  programs  from 
data  and  structures  as  possible. 

-  Make  the  data  description  independent  of  any  par- 
ticular Droqramm i ng  1 anauaqe  but  aive  it  fhe  capability  of 
interfacing  with  a  variety  of  proaramming  languages. 
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These  goals  seem  to  be  oeneral ly  agreed  on  in  the 
literature  as  being  reasonably  complete.  There  is  consider- 
able controversy*  however,  over  the  relative  importance  of 
individual  aoals.  In  particular,  some  contend  that  the  pri- 
mary goal  should  be  to  allow  the  user  to  be  free  of  the  data 
base  structures  entirely  [Ref.  9]  . 

B.   The  Network  Model. 

One  of  the  two  data  base  models  which  has  received  wide 
attention  is  the  network  model.  This  model  is  grounded  in 
graph  theory  and  relationships  between  data  are  represented 
by  some  form  of  directed  graph.  The  nodes  of  the  qraph  ma'y 
be  entities  containina  data  attributes  or  may  simply  be 
place  holders  whose  only  attributes  are  the  arcs  of  the 
graph.  The  arcs  reoresent  looical  links  between  the  enti- 
ties which  can  be  travelled  in  the  direction  of  the  arc  to 
navigate  through  the  data  base.  Thus,  even  though  the 
implementation  of  the  arcs  may  be  transparent  to  the  user, 
the  access  paths  are  visible  to  the  user  as  part  of  the 
structure  of  the  data  base.  The  OML  is  said  to  be  prescrip- 
tive of  the  data  access  oaths,  that  is,  it  must  orescribe 
the  course  through  the  data  structures. 

Various  restrictions  as  to  the  tyoe  of  network  allowed 
may  be  imposed  on  a  network  model.  For  example,  the  graphs 
may  be  reauired  to  be  acyclic  or  the  structures  may  be  res- 
tricted to  trees,  chains  or  lists.  A  non-homogeneous  model 
has  the  restriction  that  if  two  nodes  are       connected   by   an 
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arc  then  the  entities  reoresented  bv  the  nodes  may  not  be  of 
the  same  tyoe.  A  SDecia)  class  of  the  non-homoqeneous  net- 
work model  is  the  hierarchical  model.  Hierarchical  models 
have  the  following  restrictions:  the  graphs  must  be  trees* 
an  entity  of  any  type  may  appear  only  once  on  a  particular 
branch  of  the  tree  and  certain  entity  types  must  always 
appear  on  a  given  branch  at  a  higher  level  than  other  entity 
tvpes.  An  example  of  a  hierarchical  data  base  would  be  one 
with  the  entities  country,  state*  county  and  city.  For  a 
particular  country*  one  or  more  of  the  entities  of  state  and 
county  may  be  left  out  between  a  country  and  its  cities*  but 
a  city  cannot  appear  above  a  state*  nor  can  it  appear  above 
another  citv.  An  example  of  a  non-hierarchical  data  base  is 
the  BOMP  in  which  a  subassembly  may  contain  other  subassem- 
blies which  may  in  turn  contain  subassemblies.  Note  that 
the  BOMP  is  homogeneous  since  subassemblies  are  linked  to 
subassemb lies. 

C.   The  Relational  Model. 

The  second  of  the  two  most  discussed  modes  for  data 
representation  is  the  relational  model.  This  model  is 
grounded  in  set  theory  and  specifically  in  the  concept  of  a 
relation  in  the  mathematical  sense.  Given  sets  SI*  S  2  *  ...* 
Sn  (not  necessarily  distinct)*  R  is  a  relation  on  these  sets 
if  it  is  a  sub-set  of  the  Cartesian  product  of  SI  x  S2  x  ... 
x  Sn.  The  element  of  R  are  n-tuples  whose  jth  component  is 
from   Sj*   for   j   from   one  to  n.   R  is  said  to  be  an  n-ary 
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relation  or  of  dearee  n  and  Sj  is  called  the  j  t  h   domain   of 
R. 

In  the  relational  model f  each  set  Sj  must  be  a  set  of 
1  i  k  e  - 1  y  d  e  attributes.  The  relations  are  named  and  time 
variant  according  to  some  maintenance  algorithm.  An  example 
of  a  relation  would  be  a  set  called  time-spectrum  made  up  of 
freauency;  time  and  amplitude  triples.  This  ternary  rela- 
tion might  represent  the  latest  30  minutes  of  data  from  a 
hydrophone.  Every  relation  must  have  a  key  by  which  the 
tuples  can  be  identified.  The  key  must  be  uniaue  (no  two 
tuples  with  the  same  key)  and  non-redundant  (the  whole  key 
is  needed  for  identification).  In  the  above  example*  fre- 
quency and  time  make  up  the  key. 

The  chief  advantage  of  the  relational  model  is  that  the 
user's  view  of  the  data  is  independent  not  only  of  the  phy- 
sical mapping  to  media*  but  also  of  the  access  oaths 
involved.  The  chief  disadvantaae  is  the  difficulty  of 
devising  an  implementation  which  is  reasonably  efficient  for 
all  applications  [Ref.  8]  • 

A  good  deal  of  work  has  been  done  on  normalizing  rela- 
tions to  remove  undesirable  data  representational  charac- 
teristics and  providing  aoorooriate  operations  and  transfor- 
mations for  relations.  References  14*  15  and  16  give  a 
definitive  exposition  on  the  theory  of  relational  data 
mode  1 s  . 
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D.   The  CODASYL  DBTG  DBMS. 

I .   History. 

CODASYL  is  an  informal  and  voluntary  organization 
of  interested  individuals*  supoorted  by  their  i nst i t ut i ons * 
who  contribute  their  efforts  and  exoenses  towards  the  ends 
of  designing  and  develooina  techniques  and  languages  to 
assist  in  data  systems  analysis*  desian  and  i mo  1 ement at i on  . 
Founded  in  1959,  its  most  famous  achievement  has  been  the 
definition  of  the  COmmon  Business  Oriented  Language  (COBOL). 
In  June*  1965*  the  CODASYL  COBOL  Language  Subcommittee  of 
the  Programming  Lanauages  Committee  (PLC)  resolved  to  organ- 
ire  a  task  force  to  study  list  orocessing.  In  November* 
1965*  this  task  force  produced  a  prooosed  list  processing 
extension  to  COBOL  for  file  management.  In  May*  1967*  the 
List  Processing  Task  Force  chanaed  its  name  to  the  Data  Base 
Task  Group  (DBTG)  and  undertook  a  comparative  studv  of  data 
base  manaaement  techniques  and  systems.  This  study  was  cul- 
minated by  the  oublication  of  an  interim  report  in  February* 
1968  and  the  agreement  bv  the  Lanouage  Subcommittee  that 
"COBOL  needs  the  Data  Base  Concept"  fRef.  2].  At  the  Tenth 
Anniversary  Meeting  of  CODASYL  held  in  May*  1969,  considera- 
tion was  aiven  to  separating  the  data  description  and  data 
manipulation  languages.  The  idea  received  wide  endorsement 
at  the  meetina  and  was  the  basis  for  the  direction  of 
efforts  by  the  DBTG  until  October*  1969  at  which  time  Ref. 
17  was  presented.  From  the  time  of  publication  of  Ref.  17 
until   the   publication   of   Ref.   3   in   April*   1971*   179 
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prooosals  for  changes  and  extensions  to  the  DDL  and  DML  were 
considered/  of  which  130  were  incoroorated  into  Ref.  3.  In 
June*  1971,  it  was  decided  that  the  schema  DDL  should  be 
developed  seoaratelv  from  the  COBOL  DDL  and  DML.  Accord- 
ingly, the  Data  Descriot ion  Languaae  Committee  (DDLC)  was 
formed  as  a  separate  organization  from  the  PLC.  The  DDLC 
proceded  with  modifications  and  enhancements  to  the  DBMS  and 
schema  DDL  definitions  and/  in  June,  1973,  produced  Ref.  2. 
This  document  is  currently  the  basis  for  the  CODASYL  DBMS. 

2,   Terminology  and  Concepts. 

For  a  comolete  description  of  the  CODASYL  schema  DDL 

statements   and   DBMS   desian  see  Ref.  2.  The  schema  DDL  is 

used  to  describe  a  data  base  and  has   the  following   entity 

types:  Data  items,  data  aggregates,  records,  areas  and 
sets  . 

A  data  item  is  an  occurence  of  a  named  atomic  data 
attribute.  It  is  the  smallest  unit  of  named  data.  The  set 
of  values  that  a  data  item  can  assume  is  called  its  ranoe. 
The  range  of  an  item  is  always  restricted  to  values  of  a 
oarticular  tyoe.  The  possible  types  are  arithmetic  data, 
string  data,  data  base  kevs  and  imolementor  defined  types. 

A  data  agareqate  is  an  occurence  of  a  named  collec- 
tion of  data  items.  There  are  two  kinds:  vectors  and 
repeating  groups.  A  vector  is  a  one  dimensional  seauence  of 
data  items,  all  with  identical  characteristics.  A  repeating 
group   is   a   collection  of       data   attributes   that   occurs 
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multiple  times  within  a  record  occurence.   The  collection  of 
attributes  may  include  data  items  and  data  agaregates. 

A  record  is  an  occurence  of  a  named  collection  of 
zero  or  more  data  items  or  data  aagregates.  Each  record 
entry  defines  a  record  type  of  which  there  may  be  zero  or 
more  occurences  within  the  data  base.  The  record  is  the 
smallest  addressable  entity  within  the  data  base. 

A  set  is  a  na^ed  collection  of  records.  Each  set 
entry  in  the  schema  defines  a  set  type  for  which  zero  or 
more  occurences  (sets)  may  exist  in  the  data  base.  Each  set 
type  declared  in  the  schema  must  have  one  record  type 
declared  as  its  owner  and  may  have  one  or  more  record  types 
declared  as  its  members.  Each  set  occurence  which  exists  in 
the  data  base  must  contain  exactly  one  record  of  its  owner 
type  and  zero  or  more  o  *  its  member  record  types.  A  special 
set  type  may  be  delared  which  has  one  and  only  one  occurence 
and  whose  owner  is  the  DBMS.  A  set  so  declared  is  said  to 
be  a  singular  set.  There  is  no  provision  for  a  record  type 
to  be  both  an  owner  and  member  record  type  of  the  same  set. 
This  means  the  COOASYL  model  is  non-homogeneous.  It  is  not/ 
however^  hierarchical  in  that  set  types  may  be  defined  with 
ownership  and  membership  such  that  cycles  can  occur. 

An  area  is  a  named  collection  of  records  which  need 
not  preserve  owner/member  relations.  An  area  may  contain 
occurences  of  multiple  record  types  and  a  record  tyoe  may 
occur  in  multiple  areas.   A  particular  recora  occurence  of  a 
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record  is  assigned  to  an  area  when  it  when  it  is  created  and 
it  may  not  migrate  out  of  that  area.  An  area  may  be 
declared  to  be  temporary.  Temoorary  areas  are  created  espe- 
cially for  a  run-unit/  exist  for  the  life  of  the  run-unit 
and  are  destroyed  when  the  process  terminates.  Two  run- 
units  may  have  a  particular  temporary  area  open  concurrently 
but  each  run-unit  is  using  a  different  version  of  the  area 
which  is  unique  to  that  particular  run-unit.  The  concept  of 
area  allows  the  subdivision  of  the  data  base.  It  allows  the 
DBMS  to  control  placement  of  an  entire  area  to  provide  effi- 
cient storage  and  retrieval.  Areas  are  a  convenient  unit 
for  recovery  and  also  provide  a  convient*  natural  subdivi- 
sion for  allowing  a  part  of  the  data  base  to  be   removed   to 

0 

of  f - 1 i  ne  storage . 

A  schema  consists  of  DDL  entries  and  is  a  complete 
description  of  a  data  base.  It  includes  the  names  and 
descriptions  of  all  areas»  set  tyoes  and  record  types  that 
may  aopear  in  the  data  base.  A  data  base  is  the  totality  of 
all  records*  sets  and  areas  controlled  by  a  schema.  For  an 
installation  to  have  multiple  data  bases*  it  must  have  mul- 
tiple schemas  and  the  content  of  the  data  bases  must  be  dis- 
joint. 

No  schema  DDL  entry  may  include  references  to  the 
physical  devices  or  media  soace.  Thus  a  schema  written  in 
the  DDL  is  indeoendent  o^f  the  physical  storage  of  data  and 
the  data  may  be  stored  on  any  combination  of  storage  media 
available  to  a  DBMS.   Some  devices*  due  to  their   sequential 
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nature,  may  not  allow  the  full  advantages  of  DDL  facilities* 
however  the  use  of  these  devices  is  not  precluded. 

A  program  is  a  set  or  aroup  of  instructions.  User 
programs  must  have  access  to  a  sub-schema  DDL  description  of 
that  portion  of  the  data  base  thev  are  interested  in.  Addi- 
tionally, they  must  be  able  t o  use  a  DML  to  interact  with 
the  data  base  through  the  DBMS. 

A  run-unit  is  the  execution  of  one  or  more  programs 
viewed  by  the  ooerating  system  as  a  unit.  Under  OS/360,  the 
run-unit  might  be  a  job  and  under  UNIX,  a  parent  Drocess  and 
any  children.  The  run-unit  makes  requests  of  the  DBMS  which 
in  turn  consults  the  schema  and  interacts  with  the  operating 
system  to  fulfill  the  request. 

A  user  working  area  CUWA)  is  conceptually  a  loading 
and  unloading  zone  where  all  data  provided  to  a  run-unit  bv 
the  DBMS  and  all  data  to  be  oicked  up  by  the  DBMS  must  by 
placed.  The  DBMS  has  its  own  svst-em  buffers  which  it  uses 
to  manipulate  the  data  base.  It  uses  the  U W A  only  for  inout 
and  output  of  data  for  the  reauesting  run-unit.  Each  run- 
unit  has  its  own  UWA. 

3.   The  Schema  vs.  the  Sub-schema. 

The  subschema  has  the  following  characteristics.  An 
arbitrary  number  of  oossibly  overlapoing  sub-schemas  may  be 
declared.  Multiple  proarams  mav  reference  a  sub-schema  but 
they   have   access   onlv   to   that   portion  of  the  data  base 
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included  in  the  sub-schema.  Thus,  the  sub-schema  DDL 
description  enables  the  subsettinq  of  the  data  base  so  that 
a  user  program  need  only  worry  about  that  portion  of  the 
data  base  it  uses*  and  insulates  the  remainder  of  the  data 
base  from  the  user.  A  measure  of  the  data  independence  is 
provided  between  the  schema  and  sub-schema.  The  sub-schema 
description  may  differ  from  the  schema  in  the  following 
ways  . 

a .  Data  Item  Leve 1  . 

Descriptions  of  items  may  be  omitted.  Included 
items  may  be  of  a  different  type  or  in  a  different  position 
within  the  record. 

b.  Data  Aggregate  Level. 

Descriptions  of  sDecific  data  aggregates  may  be 
omitted.  Data  aggregates  and  items  may  have  additional 
Structure  imposed  on  them  (e.a.  vectors  may  become  multi- 
dimensional arrays).  The  position  of  data  aggregates  within 
a  record  may  be  changed. 

c .  Record  Leve 1  . 

Descriptions  of  records  may  be  omitted. 
Descriotions  of  new  record  tyoes  composed  of  data  from  other 
record  tyoes  nay  be  introduced  (not  suooorted  by  the  COBOL 
or  C  DDL's)  . 
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d.  Set  Level . 

Descriptions  of  specific  set  tyoes  may  be  omit- 
ted. Different  set  selection  criteria  may  be  soecified. 
Descriptions  of  soec i f i c  membe r -  record  tyoes  mav  be  omitted. 

e.  Area  Level  • 

Descriptions  of  specific  areas  and  the  records 
within  them  may  be  omitted/  while  occurences  of  the  same 
record  type  in  other  areas  are    included. 

4.  The  Schema  and  the  DML . 

The  relationship  between  the  DDL  and  the  DML  is  that 
between  declarations  and  procedures.  In  order  to  specifv 
this  relationship/  a  set  of  basic  data  manipulation  func- 
tions must  be  defined  which  is  DML  and  host  1 anouage 
independent.  Specific  commands  provided  by  a  particular  DML 
must  be  resolved  into  these  basic  functions.  Basic  func- 
tions include  the  capability  of  selecting  records,  present- 
ing them  to  a  run-unit  and  addincw  changina  or  removing 
records  and  relationships. 

5.  Data  Base  Administration. 

Certain  facilities  must  be  availible  to  support  the 
user  programs.  These  tools  are  not  defined  in  the  CODASYL 
DBMS  and  may  include  the  following. 
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a .   Recovery  Routines. 

Data  base  recovery  routines  may  be  used  includ- 
ing activity  loaging,  checkpoint  and  rollback. 

b  .   Utility  Rout  i  nes . 

Utility  and  service  routines  are  reguired  to 
support  a  data  base  in  day-to-day  operations.  Examples 
include  routines  for  editina  and  printing,  loadina  and  dump- 
ing, Drecondi t i on i no ,  garbaae  collection,  statistical 
analysis  and  comparison. 

c.  Schema  Met a-1 anquage . 

This  language  oermits  changes  in  the  schema  and 
cause  them  to  be  reflected  in  the  data  base.  Without  such  a 
language,  the  changes  must  be  made  by  defining  a  new  schema 
and  recreatina  the  data  base  accordingly. 

d.  Device  N'edia  Control  Languaae  (DMCL). 

This  1 anauaae  orovides  for  assignment  of  data  to 
devices  and  media  space,  and  specification  and  control  of 
buffering,  paging  and  overflow. 

o.   Data  Base  Procedures. 

At  various  points  in  the  accessino.  of  a  data  base, 
non-standard  computations  or  processing  may  be  reauired.  To 
allow  for  these  situations,  the  capability  is  provided  to 
define   data   base   procedures.    These   procedures   may   be 
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invoked  for  checking  of  privacy  locks/  producing  computed 
results  from  other  items*  searching  alaorithms*  data 
comoression  and  expansion,  validity  checking  or  system 
instrumentat  ion. 

7.   Record  Placement  Control. 

The  schema  DDL  permits  specification  of  an  area  or 
areas  to  which  record  occurences  of  a  particular  type  must 
be  assigned.  The  schema  DDL  also  includes  a  clause  which 
causes  records  being  added  to  be  placed  near  some  other 
record.  Conceptually*  the  effect  of  such  clauses  is  to 
cause  clustering  of  records  which  are  likely  to  be  used  in 
conjunction  with  one  another.  These  declarations  for 
selecting  the  area  and  location  within  the  area  are  the 
WITHIN  clause  and  the  LOCATION  clause*  respectively*  of  the 
record  subentry.  The  fact  that  the  schema  DDL  permits 
placement  control  is  not  assumed  by  CODASYL  to  have  any  phy- 
sical connotations. 

8  .   Data  Base  Keys . 

The  DDL  assumes  that  every  record  occurence  in  the 
data  base  has  a  unique  identifier  which  enables  the  DBMS  to 
distinguish  it  from  every  other  record  in  the  data  base. 
This  key  must  be  assigned  when  the  record  is  created  and 
remains  with  it  for  the  life  of  the  record.  This  key  may  be 
supolied  to  the  DBMS  by  a  run-unit  or  data  base  Drocedure* 
generated  from  the  record's  contents  or  assianed  by  the 
DBMS.    The   oermanence  of  the  key  must  be  insured  since  anv 
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run-unit  may  use  the  keys  to  refer  to  the  record. 

9.  Orderina  of  Sets. 

Each  set  tyoe  declared  in  the  schema  must  have  an 
orderinq  soecified  for  it.  This  order  is  maintained  by  the 
DBMS  and  is  a  logical/  not  a  Dhysical »  orderinq.  Thus*  the 
same  record  occurences  could  oarticipate  as  members  in 
several  sets  of  different  tvoes  and  be  ordered  differently 
in  each  of  the  sets.  The  member  records  of  each  occurence 
of  a  given  set  tyoe  can  be  ordered  in  any  of  the  followinq 
ways  • 

-  Sorted  in  ascending  or  descending  order  based  on  the 
value  of  specified  keys.  These  keys  may  be  data  items  in 
the  member  records/  the  names  of  the  member  records/  the 
data  base  keys  of  the  memoer  records  or  some  combination  of 
the  above . 

-  Sorted  in  the  order  resulting  from  inserting  new 
member  records  first  in  the  set/  last  in  the  set  or  before 
or  after  the  set  member  which  is  currently  known  to  the 
requesting  run-unit. 

-  Sorted  in  the  order  most  convenient  to  the  DBMS. 

10.  Search  Keys. 

An  arbitrary  number  of  search  kevs  may  be   declared 

for   a   set   type  regardless  of  whether  it  is  sorted  or  not. 

The  components   of   the   search  kevs   must   be   data   items 

included   in  the  member  records  of  the  set.   The  declaration 

of  a  search  key  causes  the  0BMS  to  develoo  and  use  some  kind 
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of   indexing  for  the  member  records  of  each  occurence  of  the 

set  tvoe.   The  term  indexina  is  used  here  to  refer       to  any 

technique   which   does   not   involve   a  comolete  scan  of  the 
records  involved. 

11.  Set  Membership. 

A  record  type  may  have  different  kinds  of  set 
membership  declared  for  different  set  types.  Automatic 
membership  means  that  membership  is  established  in  an 
approoriate  occurence  of  a  set  tyoe  when  a  record  is  added 
to  the  data  base.  Manual  membership  means  that  membership 
can  only  be  established  in  a  set  occurence  by  a  run-unit 
executing  an    insert  function. 

Mandatory  or  optional  membership  concerns  the  remo- 
val of  a  record  from  a  set  occurence.  Once  a  record  has 
been  established  as  a  member  of  a  set  for  which  it  has  man- 
datory membership,  it  cannot  be  removed  until  the  record  is 
deleted.  If  the  membershio  is  optional,  the  record  may 
cease  to  have  membershio  via  a  remove  function. 

A  set  tyoe  may  be  declared  as  dynamic.  A  dynamic 
set  may  have  a  record  of  anv  type  inserted  into  it  or 
removed  from  it.  If  a  set  type  is  declared  to  be  dynamic, 
no  member  records  may  0*>  declared  for  it. 

12.  Set  Sel ec t  i  on  . 


In  general,  there  will  be  more  than  one   set   of   a 
given   type   in  the  data  base.   It  is  therefore  necessary  to 
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provide  a  means  fop  identifying  the  proper  set  when  member 
records  are  stored  and  retrieved.  The  SELECTION  clause  of 
the  member  subentry  in  the  DDL  controls  the  strategy  for 
selecting  a  specific  set  of  a  given  type.  A  separate  SELEC- 
TION clause  is  reouired  for  each  member  record  type  and  set 
type  pair.  The  SELECTION  clause  provides  for  naming  a 
series  of  sets  which  form  a  continous  path  to  the  desired 
set.  For  all  the  sets  along  the  path,  other  than  the  first 
named  set»  the  DB^S  limits  its  search  to  the  member  records 
of  the  set  selected  at  the  previous  steo  in  the  path. 

13.   Pr  i  vac v  of  Data. 

Protection  aoainst  unauthorized  data  access  is  pro- 
vided through  a  mechanism  of  privacy  locks  which  are  speci- 
fied in  the  schema.  Privacy  keys  must  be  orovided  by  a 
run-unit  seeking  to  access  or  alter  data  protected  by  a 
privacy  lock.  The  schema  DDL  provides  for  declaring  privacy 
locks  at  the  schema,  area,  record,  data  item,  data  aggre- 
gate, set  and  member  levels.  Locks  can  be  declared  for 
specific  functions  at  each  of  these  levels.  A  privacy  lock 
is  either  a  value  which  must  be  matchea  by  a  cor reSDond i nq 
privacy  kev  or  a  data  base  procedure  which  is  called  to 
validate  the  privacy  key.  If  a  procedure  is  used,  it 
returns  a  yes  or  no  answer,  and  beyond  this  the  action  of 
such  a  procedure  is  implementor  defined. 
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14.   Intearity  of  Data. 

The  DDL  provides  for  the  checking  of  the  validity 
of  a  data  item  whenever  a  value  is  chanqed  or  a  new  value  is 
stored  in  the  data  base.  In  addition,  provision  is  made  for 
the  naming  of  data  base  procedures  which  the  DBMS  invokes 
when  a  run-unit  attemots  to  uDdate  nominated  records  or 
sets.  This  feature  enables  a  check  of  any  update  or  series 
of  updates  aoolied  to  the  data  base. 

E.   Laboratory  Equioment  and  Software. 

The  comouter  equipment  in  the  laboratory  consists  of  two 
PDP  11/50's  with  associated  peripherals.  The  information 
about  the  equipment  which  is  relevant  to  this  thesis  is 
minimal »  however  it  should  be  noted  that  the  DBMS  was 
developed  using  an  interactive  display  terminal  and  is 
oriented  toward  that  environment. 

The  operating  system  which  supports  the  DB^S  is  UNIX. 
Reference  1  contains  a  supplement  to  the  following  discus- 
sion of  UNIX. 

1 •   The  File  System. 

The  most  imoctant  function  of  UNIX  is  to  provide  a 
file  system.  From  the  usee's  point  of  view  their  are  three 
kinds  of  files:  ordinary  files*  directories  and  special 
files. 
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An  ordinary  file  can  contain  any  information  the 
user  desires.  The  system  imposes  few  special  structure 
requirements  on  files*  however  some  programs  expect  files  of 
a  certain  format.  &  text  file  consists  of  a  string  of  char- 
acters with  lines  delimited  by  new  line  characters.  A 
binary  program  file  is  a  seauence  of  words  as  they  will 
aooear  in  main  memory  when  the  proaram  is  executed.  The 
assembler  and  loader  programs  use  special  object  file  for- 
mats. 

Directories  provide  the  mapping  between  the  names  of 
files  and  the  files  themselves.  They  induce  a  structure  on 
the  file  system  as  a  whole.  A  directory  behaves  exactly 
like  an  ordinary  file  except  that  the  system  controls  its 
format  and  contents.  Each  system  user  has  a  di  rectory  asso- 
ciated with  his  user  name  and  he  may  create  sub-directories 
to  organize  collections  of  his  files.  The  system  has 
several  directories  which  it  maintains  for  its  own  use.  One 
is  the  root  directory.  The  directories  in  a  file  system 
form  a  tree  and  the  root  is  the  base  of  this  tree.  Thus* 
any  file  in  the  system  can  be  located  bv  tracina  a  oath  from 
the  root  t^rouah  t^e  aoorooriate  directories.  Another  sys- 
tem directory  contains  all  the  programs  which  are  used  as 
system  commands  and  is  special  only  in  that  certain  proarams 
"know"  its  name. 

Each  directory  must  aopear  as  an  entry  in  exactly 
one  other  directory  called  its  parent.  Eac*  directory  has 
two  special  entries.   These  are    the  name  "."  which  refers  to 
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the  directory  itself  and  the  name  ".."  which  refers  to  the 
parent  directory.  These  entries  enable  reference  to  the 
directory  and  its  narent  without  knowinq  the  name  expli- 
citly. 

File  names  are  strinqs  of  14  or  fewer  characters. 
Identification  of  a  file  to  the  system  is  accomplished 
throuqh  a  strinq  of  directory  names  separated  by  virqules 
("/")  and  terminated  by  the  file  name  desired.  This  strinq 
is  called  a  path  name.  in  hen  the  path  name  is  smarted  with  a 
virqule,  the  system  beqins  the  path  search  at  the  root 
directory,  otherwise  it  starts  at  the  user's  current  workinq 
directory.  For  example,  the  path  name 
"/ fox t rot /un i form/char  1 i e"  would  cause  the  system  to  start 
at  the  root,  search  for  directory  "foxtrot",  search  "fox- 
trot" for  directory  "uniform"  and  find  file  "charlie"  in 
"uniform".  The  file  "charlie"  could  be  any  type  file, 
includinq  a  directory.  In  another  case*  the  pathname  "kilo" 
would  cause  the  system  to  search  the  user's  current  direc- 
tory for  "kilo".  The  path  name  "/"  refers  to  the  root 
i  t  s  e  1  f  . 

SDecial  files  provide  the  means  of  handling  I/O  dev- 
ices. Each  device  suDported  by  UNIX,  includinq  communica- 
tions lines  and  main  memory,  is  associated  with  one  or  more 
soecial  files.  These  files  can  be  read  or  written  in  the 
same  manner  as  ordinary  files  except  that  the  result  is  the 
activation  of  the  aporoDriate  device.  All  soecial  files 
reside  in  directory  "/dev". 


3a 


The  access  control  or  protection  scheme  in  UNIX  is 
relatively  simple.  Each  user  known  to  the  system  has  a 
unique  user  number  called  the  user  id.  When  a  file  is 
created*  the  aporopriate  user  id  is  associated  with  it  and 
bits  are  set  in  the  directory  entry  indicating  which  users 
have  permission  to  read*  write  or  execute  the  file.  A 
facility  is  provided  for  executable  files  called  set-user-id 
whereby  when  the  files  are  executed  the  resulting  orocess 
assumes  the  user  id  of  the  owner  of  the  executable  file. 
This  enable  a  system  program  executed  by  a  user  to  access 
files  which  the  user  cannot  directly  access  himself.  Since 
anyone  may  cause  his  executable  files  to  use  set-user-idr 
this  feature  is  generally  available  to  provide  protected 
access  to  files.  The  system  recoanizes  one  user  id  (the 
"super  user")  as  being  free  of  any  access  restrictions.  The 
major  flaw  in  the  UNIX  protection  scheme  is  that  there  is  no 
way  to  monitor  or  lock  out  simultaneous  ooening  of  a  file  by 
multiple  proarams  with  access  rights  to  the  files.  The 
system's  authors  contend  that  these  features  are  neither 
necessary  nor  sufficient  for  integrity  controls  fRef.  1]. 
However*  the  reason  ino  behind  declaring  the  features 
unnecessary  was  that  "we  are  not  faced  with  large  single- 
file  data  bases  maintained  by  independent  orocesses". 

2.   Input/Output  (I/O)  Calls. 

Under  UNIX,  I/O  calls  are  designed  to  eliminate  the 
difference  between  the  various  devices  and  forms  of  access. 
The  file  system  organizes  all   media   soace   into   512   bvte 
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blocks  which  are  its  smallest  readable  and  writable  unit. 
Consequent  1 y,  reads  and  writes  of  512  bytes  starting  on  a 
512  byte  boundary  are  most  efficient.  However*  no  loaical 
record  size  is  imoosed  by  the  system/  nor  is  there  any  dis- 
tinction between  random  or  sequential  access.  To  read  or 
write  an  already  existing  file*  an  "open"  call  must  be  made. 
This  system  call  is  oassed  a  path  name  and  returns  a  number, 
called  a  file  descriptor,  which  identifes  the  open  file  to 
the  system.  The  file  descriptor  is  used  in  subsequent  I/O 
calls.  In  order  to  create  and  open  a  file,  a  "creat"  call 
must  be  made.  This  call  requires  parameters  which  specify 
the  file  name  and  access  mode,  and  returns  a  file  descrip- 
tor. A  "creat"  on  an  existinq  file  truncates  it  to  zero 
lenoth.  An  ooen  file  may  be  accessed  via  "read"  and  "write" 
calls.  These  system  calls  require  the  file  descriptor,  the 
location  of  a  read/write  buffer  and  the  length  of  the 
buf  f er . 

To  enable  random  access  of  appropriate  files,  the 
"seek"  call  is  provided.  This  svstem  call  merely  changes 
the  read/write  pointer  associated  with  an  open  file.  The 
read/write  pointer  contains  the  byte  offset  from  the  begin- 
ning of  the  file  at  which  the  next  access  will  beqin.  Other 
system  calls  exist  for  such  file  manipulations  as  closinq  a 
file,  finding  the  status  of  a  file,  changino  the  protection 
mode  or  owner  of  a  file,  creatinq  or  removing  a  directory, 
making  a  link  to  an  existinq  file  and  deletinq  a  file. 
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3.   Processes  and  Imaaes. 

An  image  is  an  entire  comouter  execution  environment 
including  main  memory  imaqe*  general  register  values*  the 
status  of  ooen  files  and  the  identity  of  the  current  direc- 
tory. Thus  the  imaae  constitutes  a  state  vector  of  a  pro- 
cess which  contains  all  information  necessary  to  resume  exe- 
cution of  the  process.  A  process  is  the  execution  of  an 
image  while  the  virtual  machine  is  imposed  on  the  hardware 
by  the  system.  The  virtual  address  space  of  a  process  is 
divided  into  three  logical  seqments:  the  program  text 
(instructions  and  constants)*  data  and  stack.  Pure  text  is 
read  only  for  the  user  while  the  data  and  stack  segments  may 
expand  or  contract  in  size. 

A  process  comes  into  existence  through  a  "fork"  call 
executed  by  another  process.  This  system  call  creates  an 
exact  duplicate  of  the  image  of  the  calling  process.  The 
only  difference  between  the  processes  is.  that  one  process  is 
considered  the  parent  and  the  other  the  child.  Both  execute 
as  if  returning  from  the  "fork"  call.  The  parent  receives 
as  a  return  value  a  number  called  the  process  id*  which 
uniauely  identifies  the  child. 

The  child  receives  zero  as  its  return  value.  Synchroniza- 
tion between  parent  and  child  is  provided  by  the  "wait" 
call.  When  a  process  with  children  executes  a  "wait"/  its 
execution  is  suspended  until  one  of  its  children  terminates. 
The  return  value  of  the  "wait"  is  the  process  id  of  the  ter- 
minated  child.    Interorocess   communication  is  orovided  by 
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the  "oipe"  call.  This  system  call  sets  ud  a  channel  which 
can  be  read  or  written  by  any  process  which  has  as  an  ances- 
tor the  process  that  executed  the  "pine"  call. 

The  "exec"  system  call  is  provided  to  allow  the  exe- 
cution of  a  program  (i.e.*  executable  file).  The  "exec" 
call  needs  a  oath  name  to  tHe  file  as  its  argument.  A  pro- 
cess executing  an  "exec"  has  all  its  code/  data  and  stack 
soace  overlaid  by  the  referenced  program  if  the  call 
succeeds.  Open  files*  the  current  directory  and  interpro- 
cess relationships  remain  unchanqed.  A  return  from  the 
"exec"  occurs  only  if  the  function  is  unsuccessful.  Termi- 
nation of  a  process  can  be  accomplished  via  an  "exit"  system 
call.  When  an  "exit"  is  executed*  the  process  and  associ- 
ated image  cease  to  exist. 

4  •   The  C  Lanauage. 

C  is  the  programming  lanquaae  primarily  used  under 
UNIX.  Most  of  UNIX  itself  is  coded  in  C.  C  provides  modern 
control  structures  to  allow  structured  GOTO-less  coding. 
Its  design  objectives  were  to  give  shorter  and  c\earer  code* 
encourage  modularity  and  good  oroaram  organization  and  pro- 
vide facilities  for  many  different  tyoes  of  data  including 
pointers  and  character  strinas. 

A  C  prooram  consists  of  a  group  of  functions  (one  of 
which  must  be  named  "main")  and  oossibly  some  external  data 
declarations.  Parameters  may  be  oassed  between  functions 
via   call   and   return   arguments   or   through  external  data 


38 


items.  C  is  not  a  block  structured  1 anauage  in  that  func- 
tions cannot  be  defined  locally  to  other  functions  and 
external  data  names  may  not  be  redeclared  locally  to  a  func- 
tion. However,  the  block  structured  1 anguaae  feature  of 
allowinq  a  orouD  of  statements  to  be  considered  as  a  sinqle 
statement  is  included.  This  qrouDinq  is  accomplished  by 
enclosing  the  statements  within  "("  and  ">". 

The  basic  data  types  in  C  are  "int",  "char", 
"float",  "double"  and  "struct".  In  addition,  arrays  of  or 
pointers  to  any  of  these  types  can  be  declared.  Items  of 
type  "int"  are  16-bit  two's  complement  integers.  Items  of 
type  "char"  are  8-bit  values  which  can  be  interpreted  as 
characters  or  as  two's  complement  integers.  Strings  are 
represented  as  arrays  of  characters.  Items  of  type  "float" 
or  "double"  are  binary  floating  point  numbers  of  length  32 
and  64  bits  reoect i vely.  An  item  of  tyoe  "struct"  consists 
of  a  group  of  item  declarations  (possibly  includino  arrays) 
which  can  be  viewed  as  a  unit.  This  latter  capability  pro- 
vides for  user*  definition  of  a  theoretically  infinite  number 
of  data  t  yoes  . 

C  provides  a  large  number  of  binary  and  unary  arith- 
metic and  logical  ooerators.  Arithmetic  ©Derations  provided 
are  addition  and  subtraction,  multiolication  and  division, 
incrementation  and  decrementation,  and  bit-wise  OR,  AND  and 
complement.  Looical  operators  allow  exDressions  to  be  com- 
pared, logically  AND'ed,  loqically  OR'ed  and  loqically  com- 
plemented.    No   distinction   exists    between    a    logical 
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expression  and  an  arithmetic  expression.  Any  expression  has 
a  true  value  if  and  onlv  if  it  evaluates  to  a  non-zero 
value. 

Assignment  statements  are  provided  in  C  which  are 
unusual  in  the  following  ways.  An  assignment  statement  can 
be  used  as  an  expression  and  has  the  value  that  was  assigned 
to  the  variable  on  the  left  hand  side  of  the  assignment 
statement.  A  number  of  assianment  operators  exist  which 
cause  a  binary  ooeration  to  take  place  between  the  left  hand 
side  and  the  evaluated  riaht  hand  side  orior  to  storage  of 
the  value  (e.g.  "x  =♦  2;"  adds  two  to  " x " ) . 

The  major  control  statements  in  C  are  "while",  "do- 
while",  "for",  "switch",  "aoto",  "break"  and  "continue". 
The  "while"  statements  causes  execution  of  a  oroup  of  state- 
ments as  long  as  an  expression  is  true.  A  "do-while"  state- 
ment is  like  a  "while"  exceot  that  the  control  expression  is 
evaluated  after  the  execution  of  the  group  of  statements. 
Therefore,  the  "do-while"  statement  is  always  executed  at 
least  once.  The  "for"  statement  is  an  extension  of  the 
while  which  provides  control  variable  initialization  and 
looo  incrementation.  The  "switch"  statement  allows  the  exe- 
cution of  one  of  a  aroup  of  statements  labeled  as  cases 
based  on  the  value  of  an  expression.  The  "goto"  statement 
transfers  control  to  a  labeled  statement  in  the  usual 
fashion.  "Break"  and  "continue"  exist  to  provide  for 
label-free  loop  termination  and  skippina. 
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A  subroutine  library  is  provided  for  use  with  C  pro- 
grams. It  contains  system  calls  for  I/O  and  other  func- 
tions. In  addition,  it  contains  routines  for  formated  out- 
put and  for  the  standard  functions  of  analysis.  For  a  more 
complete  descriotion  of  Cr  see  Ref.  10  and  18. 


41 


III.   IMPLEMENTATION  OF  THE  COOASYL  DESIGN. 


A.   Implementation  Philosophy. 

The  overriding  consideration  in  implementing  the  DBMS 
was  to  avoid  any  modifications  or  additions  to  the  existing 
UNIX  facilities.  This  decision  was  made  for  a  number  of 
reasons.  Firsts  other  research  is  being  conducted  in  the 
Computer  Laboratory  utilizing  the  UNIX  operatina  system  as  a 
research  tool.  Running  systems  which  reguire  non-standard 
veresions  of  UNIX  interfers  with  the  control  environment  and 
generally  makes  other  operating  system  modifications  more 
difficult.  Second,  a  modification  to  the  operating  system 
must  be  re-apolied  whenever  a  new  release  of  UNIX  is 
installed.  Third,  the  chances  of  the  DBMS  beina  transported 
to  other  UNIX  sites  is  far  greater  if  it  runs  under  a  stan- 
dard UNIX.  Finally,  the  research  goal  of  determining  if  the 
DBMS  could  be  implemented  in  a  variety  of  environments  would 
be  subverted  by  modifying  tHe  operating  system  environment. 

Since  the  most  notable  feature  in  UNIX  is  the  design  of 
its  file  system,  it  was  decided  to  utilize  the  file  system 
whenever  possible  rather  t-han  acouiring  a  larae  block  of 
physical  media  space  and  letting  the  DBMS  manage  it.  This 
philosophy  was  exoected  to  simplify  the  problem  of  mapping 
data  to  media  and  thereby  reduce  the  size  and  complexity  of 
tHe  DBMS  and  insulate  the  DBMS  from  changes  in  the  hardware. 
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The  final  guideline  was  to  implement  as  large  a  useful 
subset  of  the  features  in  the  CODASYL  design  as  feasible 
under  the  above  assumptions.  Creative  extensions  to  the 
CODASYL  design  were  avoided  since  these  would  tend  to 
obscure  the  research  goal  of  beina  able  to  measure  the  util- 
ity ofi  the  CODASYL  network  model  against  the  INGRESS  rela- 
tional model.  Efforts  were  directed  instead  to  the  realiza- 
tion of  the  aoals  of  the  CODASYL  DDLC,  which  are  very  ambi- 
tious in  themselves. 

None  of  the  above  assumotions  should  be  taken  as  pre- 
cluding the  possibility  of  future  modifications  to  enhance 
the  implementation  either  of  UNIX  or  of  the  features  of  the 
CODASYL  design.  The  intent  of  the  i mo  1 ement a t i on  philosophy 
described  herein  was  to  produce  a  standard  CODASYL  DB^S  run- 
ning under  a  standard  UNIX  for  use  as  a  baseline  product. 

B.   Organization  of  a  Data  Base. 

Virtually  all  information  about  the  data  base  described 
by  a  particular  schema  is  contained  in  a  special  directory. 
The  only  exceptions  are  certain  files  which  are  created  for 
the  life  of  a  user  process  and  then  discarded.  The  data 
base/  its  schema  and  its  directory  all  have  the  same  name. 
Although  it  is  oossible  for  directories  with  the  same  name 
to  exist  in  a  UNIX  file  system,  no  two  data  bases  should 
have  the  same  name.  The  files  within  the  directorv  associ- 
ated with  a  data  base  (called  a  schema  directory)  contain 
the   source  and  object  schemas,  the  schema  Data  Base  Manaaer 
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(DBM)  program  and  all  the  non-t emporary  data  within  the  data 
base.  Specific  files  will  be  mentioned  when  they  are 
relevant  to  the  discussion.  Apoendix  B  contains  a  complete 
listing  of  the  files  associated  with  a  schema. 

C.   Operating  Environment. 

The  environment  for  both  system  maintenance  and  user 
access  of  the  data  base  is  provided  by  the  DBM  Request  Pro- 
cessor ("dbm").  This  Droaram  is  a  aeneral  puroose  command 
language  processor  used  to  orovide  interface  with  any  data 
base.  Appendix  C  contains  a  description  of  the  functions  of 
the  DBM  Reauest  Processor. 

When  a  user  wishes  to  execute  a  program  which  accesses  a 
data  base*  he  executes  dbm  and  specifies  the  appropriate 
schema  name.  He  then  gives  dbm  an  "x"  command  and  specifies 
a  path  name  to  the  user  program  and  arguments  to  be  passed 
to  the  program.  Dbm  opens  two  oipes  as  interprocess  commun- 
ication channels  and  forks  off  two  children.  Through  "exec" 
calls*  these  processes  become  the  schema  DBM  program  and  the 
reauested  user  program  respectively.  Both  programs  are 
passed  the  file  descriptors  of  their  respective  ends  of  the 
interprocess  communication  oipes  as  part  of  their  calling 
arguments.  The  child  destined  to  become  the  schema  D3M 
changes  directories  ("chdir")  to  the  schema  directory  prior 
to  executino.  the  schema  DBM. 
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Once  the  user  and  the  schema  DBM  orogram  are  esta- 
blished* dbm  waits  until  thev  have  terminated  before  accept- 
ing any  more  commands,  Durina  execution^  the  user  program 
and  the  schema  DBM  pass  reauests  and  data  through  the  two 
pipes  with  the  user  proaram  executing  in  the  user's  working 
directory  and  the  schema  DBM  executing  in  the  schema  direc- 
tory. It  is  possible  for  other  users  to  execute  con- 
currently using  the  same  data  base?  however,  each  user  has 
his  own  version  of  the  schema  DBM  and  a  separate  set  of 
interprocess  pipes. 

This  operating  environment  differs  from  the  one 
envisioned  by  the  CODASYL  DOLC  in  that  each  user  process  is 
interfaced  to  its  own  copy  of  the  DBMS  routines.  Each  copy 
has  its  own  buffers  and  no  knowledge  of  the  existence  of 
other  copies*  except  that  which  it  can  derive  from  the  state 
of  files  within  the  schema  directory.  In  contrast*  the 
CODASYL  designers  described  implementation  of  a  sinale  copy 
of  the  D  8  M  S  routines  which  would  concurrently  communicate 
with  all  the  users  and  have  communal  system  buffers  for  ser- 
vicing all  user  reauests  CRef.  23.  The  reasons  for  this 
difference  are  twofold. 

First*  even  thouoh  pipes  are  the  only  reasonable  method 
for  interprocess  communication*  thev  are  limited  in  that  two 
processes  may  communicate  onlv  via  a  oiDe  ooened  by  a  common 
ancestor.  In  general*  the  only  common  ancestor  of  processes 
spawned  by  different  users  is  UNIX  and*  althouqh  the  mechan- 
ism  exists   for  finoing  the  process  id  of  a  process  ("os")* 
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no  mechanism  exists  for  reauestina  UNIX  to  ooen  a  pipe  to   a 
designated  process. 

Second*  even  if  a  mechanism  existed  to  connect  a  single 
copy  of  the  schema  DBM  to  users*  that  single  copy  could  not 
muster  sufficient  resources  to  service  them.  In  particular* 
a  process  may  have  onl y  fourteen  simultaneously  ooen  files 
and  each  user  would  reauire  two  files  ooen  (its  pipes)  on  a 
dedicated  basis.  Thus*  since  access  to  an  area  requires  two 
files  to  be  ooen*  a  single  schema  DBM  having  several  users 
each  reaui  ring  several  areas  would  develop  a  thrashing  con- 
dition in  which  almost  every  access  to  the  data  base  would 
incur  the  overhead  of  two  file  ooens  and  two  file  closes. 
Additionally*  a  problem  with  memory  buffer  contention  might 
arise*  although  this  oroblem  would  orobably  be  less  criti- 
cal. 

The  existence  of  separate  copies  of  the  schema  DBM  does 
not  mean  that  the  program  must  be  duplicated  in  memory  for 
each  of  its  current  users.  UNIX  provides  a  facility  for 
processes  executing  the  same  program  to  share  the  same  text 
segment*  thus  only  the  data  and  stack  segments  are  repli- 
cated  for  each  process.  The  conseouences  of  the  multiple 
schema  DBM  environment  will  be  discussed  later. 


U6 


D.   Source  and  Object  Schemes. 

When  a  new  data  base  is  to  be  created,  a  source  schema 
description  must  be  prepared.  A  schema  directory  should  be 
created  (usina  the  UNIX  function  "rnkdir")  to  contain  the 
source  language  version  of  the  schema.  The  source  schema 
description  must  reside  in  a  file  whose  name  is  formed  by 
prefixing  the  schema  name  with  "s."  and  which  is  located  in 
the  schema  directory.  The  UNIX  text  editor  ("ed")  is  suit- 
able for  entering  the  source  schema  description.  The  source 
schema  is  coded  in  a  modified  form  of  the  CODASYL  DDL 
described  in  Ref.  ?.  Differences  between  the  DDL  of  Ref.  2 
and  the  UNIX  D8MS  DDL  are    discussed  in  Appendix  G. 

Once  the  source  description  of  the  schema  is  entered/  it 
must  be  compiled  into  an  object  version.  This  compilation 
is  accomplished  via  the  "c"  command  of  the  DBM  Reauest  Pro- 
cessor. The  object  version  of  the  schema  consists  of  two 
files  which  contain  the  schema  D9M  program  and  the  encoded 
schema  descriotion,  respectively.  The  schema  DBM  interprets 
and  services  all  user  reauests  for  access  to  the  data  baser 
while  the  encoded  schema  description  is  a  comoact  symbolic 
form  of  the  schema's  structure.  The  name  of  the  file  con- 
taining the  schema  DBM  is  the  schema  name  prefixed  by 
"dbm.".  The  schema  DBM  is  discussed  in  Section  III.L  below. 
The  encoded  schema  aescriotion  file  is  used  to  initialize 
the  schema  DBM  program  and  for  information  about  fhe  data 
base  during  the  move  and  aarbage  collection  functions  of  the 
DBM  Request  Processor.   Aooendix  D  contains  a  description  of 
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the  format  of  the  schema  description  file. 

E.   Interorocess  Communication. 

The  schema  DBM  ana  the  user  process  communicate  via  the 
pipes  set  up  for  them  bv  dbm.  These  pipes  may  be  read  and 
written  just  as  if  they  were  ordinary  ooen  files.  Messages 
of  a  predefined  format  are  sent  and  received  by  both 
processes.  The  first  messaae  sent  is  the  initial  call  mes- 
sage from  the  user  process.  This  message  is  triggered  by 
the  C  D  M  |_  "permit"  function  and  contains  an  encoded  descrip- 
tion of  the  sub-schema.  The  schema  DBM  response  to  the  ini- 
tial call  includes  the  index  numbers  for  all  the  entities 
and  attributes  contained  in  the  sub-schema  description. 
Subseauent  user  orogram  messaoes  are  reauests  for  data 
retrieval  or  update  and  are  made  utilizing  the  index  numbers 
acguired  in  the  initial  call.  Since  the  schema  DBM  will 
receive  an  end  of  file  condition  when  trying  to  read  the 
interorocess  channel  after  user  termination/  no  indication 
need  be  given  to  the  schema  DBM  that  the  user  program  has 
termi  nated . 

Messages  sent  bv  the  schema  DBM  fall  into  two 
categories:  normal  responses  and  error  messages.  The  error 
codes  in  error  messages  correspond  to  those  used  by  the  C 
DM|_.  For  a  description  of  the  format  of  all  the  interpro- 
cess messages  see  Appendix  E. 


a8 


F.   Data  Base  Keys. 

In  the  COOASYL  OB^S  each  record  must  be  identified  by  a 
unique  value  called  its  data  base  key.  This  key  is  assigned 
when  the  record  is  created  and  remains  with  it  for  the  life 
of  the  record.  The  ability  to  map  a  record's  data  base  key 
to  the  record  in  a  auick  and  unambiguous  fashion  must  be 
provided  since  the  key  is  used  for  direct  access.  The  key's 
order  relative  to  all  other  keys  in  the  area  must  be  well 
defined  since  it  is  used  for  sequential  access.  However, 
the  record  is  allowed  to  move  around  in  physical  media  space 
as  lona  as  it  stays  within  the  same  area.  Section  III.G 
below  will  discuss  how  the  oroblem  of  satisfying  all  the 
criteria  for  data  base  keys  was  resolved. 

The  format  of  a  data  base  key  is  shown  in  Fig.  1.  This 
format  makes  possible  255  areas  (.area  zero  is  the  null  area) 
each  containing  up  to  16,777,215  records. 


Bits:    31     24  23  0 

Fields:   !  area  U     !  record  U    in  area  ! 

Data  Base  Key  Format. 
F  i  gure  1 . 


rthen  a  record  is  first  created  and  assianea  to  an  area, 
that  area's  index  number  becomes  part  of  its  data  base  key. 
It  is  not  Possible,  therefore,  for  a  record   to   migrate   to 
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another   area.   The  record  number  is  a  purely  logical  order- 
ing and  is  implemented  as  described  below. 

G.   Area  Handling. 

Each  area  soecified  in  the  schema  has  associated  with  it 
two  files.  The  first  of  these  is  the  file  containing  the 
data  stored  in  the  area.  Its  name  is  the  same  as  the  name 
of  the  area.  The  second  file  is  the  data  base  key  file  for 
the  area.   Its  name  is  the  area    name  orefixed  by  Mk." . 

The  data  base  key  file  is  organized  into  24-bit  entries. 
Each  entry  contains  the  starting  byte  offset  into  the  area 
data  file  of  the  record  associated  with  a  particular  data 
base  key  for  that  area.  The  data  base  keys  are  maoped  to 
the  entries  sequential  1 y.  That  is/  multiplying  the  record 
offset  portion  of  the  data  base  key  by  three  yields  the 
starting  byte  offset  -of  the  entry  in  the  key  file  associated 
with  that  data  base  key.  If  the  value  of  the  key  file  entry 
for  a  data  base  key  is  zero  then  that  key  is  null  (i.e. 
unassigned).  The  first  entry  slot  in  the  data  base  key  file 
for  each  area  is  reserved  for  storing  the  highest  used  kev 
in  the  area  to  facilitate  sequential  searchinq  of  an  area. 
Thus  record  number  zero  is  undefined  in  each  area . 

Data  records  are  stored  in  the  data  file  with  a  three 
word  prefix.  The  first  three  bytes  of  the  prefix  contain 
the  record  number  portion  of  the  data  base  key  for  that 
record.    The   fourth  byte  contains  the  record  type  (a  total 
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of  255  types  are  possible).  The  last  two  bytes  of  the  pre- 
fix contain  the  si?e  in  bytes  of  the  record  (maximum  record 
size  is  therefore  32767  bytes). 

As  records  are  created  and  added  to  the  arear  they  are 
entered  sequentially  in  the  area  data  file  followinq  the 
last  record  written  (data  base  keys  may  be  assigned  by  any 
algorithm*  however).  The  records  will  remain  in  their  ori- 
ginal locations  until  thev  are  deleted  or  moved  during  gar- 
bage collection.  If  a  record  is  moved  to  a  new  location* 
its  key  file  entry  is  updated  accordingly.  Whenever  a 
record  is  moved  or  deleted  from  the  area  data  file*  the 
first  two  words  of  the  prefix  at  its  former  location  are 
zeroed. 

The  positioning  control  mechanism  provided  in  the  schema 
DDL  is  implemented  via  data  base  keys.  The  area  control  is 
handled  in  a  trivial  fashion  since  the  area  index  number  is 
a  part  of  the  data  base  key.  The  positioning  of  a  record 
"near"  another  record  is  accomplished  by  assigning  the 
record  being  added  tHe  next  available  data  base  kev  follow- 
ing the  data  base  key  of  the  record  it  is  to  be  "near". 
This  method  speeds  access  to  records  clustered  "near"  one 
another  when  they  are  used  in  conjunction  with  each  another 
since  their  data  base  key  file  entries  are  likely  to  be  in 
the  same  block.  Additionally*  the  aarbage  collection  func- 
tion of  the  DRM  Request  Processor  automatically  re-seauences 
the  records  in  the  area  data  file  to  be  in  ascending  order 
of   data  base  key.   After  qarbage  collection  the  records  are 
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clustered  in  the  area  data  file  as  well. 

If  a  data  base  key  assianment  algorithm  causes  a  sparse 
key  space*  the  storage  needed  for  recording  the  kev  entries 
is  minimized  by  the  fact  that  UNIX  only  allocates  storage 
for  blocks  actually  accessed.  For  example*  if  a  data  base 
key  were  to  be  allocated  whose  key  entry  block  would  be  200 
blocks  beyond  the  current  end  of  the  data  base  key  filer 
only  the  block  containing  that  entry  would  be  allocated. 
Even  though  the  apparent  size  of  the  file  would  have 
increased  by  200  blocks*  the  intervening  199  blocks  would 
not  be  assigned  any  physical  media  space.  Unf or t unat el y *  if 
an  empty  block  is  read/  space  for  it  is  allocated.  This 
means  that  if  the  area  in  the  above  example  were  ever 
scanned  sequential  1 y>  all  the  non-allocated  blocks  in  the 
data  base  key  file  would  be  allocated. 

Due  to  the  deletion  and  addition  alaorithms*  gaps  will 
develop  in  the  data  file  durinq  the  course  of  processing. 
The  total  size  (in  bytes)  of  these  gaps  is  maintained  in  the 
first  four  bytes  of  the  area  data  file.  During  the  execu- 
tion of  the  schema  QBM,  the  amount  of  wasted  space  is  accu- 
mulated and  at  the  end  of  the  run  the  area  data  file  waste 
count  is  incremented.  Since  this  method  permits  more  than 
two  billion  waste  bytes  to  be  accumulated*  no  provision  is 
made  for  overflowina  the  waste  count.  When  the  waste  count 
gets  to  an  unacceptable  size*  the  OBM  Reauest  Processor  can 
be  used  to  effect  garbaae  collection. 
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Areas  which  are  designated  as  temporary  areas  are  han- 
dled in  a  slightly  different  manner.  Since  a  temoorary  area 
is  local  to  the  user  orocess  opening  it/  the  file  names  for 
such  an  area  are  suffixed  with  the  process  id  of  the  user 
process.  Since  the  process  id  uniquely  identifies  the  pro- 
cess* these  names  uniquely  identify  a  particular  version  of 
a  temporary  area.  Add i t i ona 1 1 y *  the  files  associated  with  a 
temporary  area  are  allocated  in  the  "tmp"  directory  and  are 
deleted  when  the  process  is  terminated.  The  "tmo"  directory 
has  the  characteristic  that  if  a  system  crash  occurs*  the 
files  within  it  are  lost. 

The  files  associated  with  any  area  are  automatically 
created  by  the  schema  DB^  if  it  attempts  to  open  them  and 
thev  do  not  exist.  This  means  that  when  a  schema  is  first 
created*  its  areas  will  come  into  being  automatically  as 
soon  as  they  are    needed. 

H.   Access  Methods. 

There  are  five  access  methods  which  may  be  used  for 
locating  a  record  in  the  data  base:  direct*  sequential* 
calculated*  chained  and  indexed. 

1 .   Di  rec t  Access  . 

If  the  data  base  key  of  a  record  is  known*  it  may  be 
accessed  directlv  using  the  data  base  key  mapping  mechanism 
described  above.  Every  access  to  the  data  base  ultimately 
involves   direct   access   once   the   data  base  key  is  known. 
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Unadorned  direct  access  is  provided  to  the  user  through 
record  currency  and  throuqh  exolicit  key  record  selection 
expressions  (see  Section  IV  below). 

2 .  Sequential  Area  Scan. 

The  "next"  and  "prior"  records  in  an  area  are 
accessed  through  a  seauential  scan.  The  algorithm  succes- 
sively increments  or  decrements  the  record  number  in  the 
current  data  base  key  until  the  next  or  previous  non-null 
data  base  key  is  found. 

3.  Calculated  or  Hashed  Access. 

« 

A  data  base  key  may  be  developed  by  a  hashing  algo- 
rithm which  uses  data  in  a  record  for  a  hash  key.  The 
schema  record  entry  for  records  accessed  by  hashing  must 
have  a  location  mode  of  "CALC".  During  record  creation* 
CALC  key  collisions  are  resolved  by  a  forward  linear  scan 
until  a  null  key  is  developed.  A  key  link  is  estaolished  in 
the  synonym  record  to  enable  future  access  to  the  new 
record.  If  multiple  collisions  occur  on  the  same  data  base 
key*  a  linked  list  is  developed  leading  to  the  last  synonym 
added.  A  standard  schema  DBM  utility  routine  is  used  for 
hashing  ("randkey").  If  non-standard  hashing  for  a  record 
is  desired*  a  data  base  procedure  may  be  specified  in  the 
location  mode  clause  of  the  record's  schema  entry.  When  a 
record  of  a  type  using  a  non-standard  hashing  procedure  is 
to  be  added*  the  data  base  orocedure  declared  in  the  loca- 
tion  mode   clause   for  the  record  will  be  called  to  provide 
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the    data    base    key, 

4.  Chained  Access. 

The  default  method  of  set  linkage  is  via  chaining. 
The  links  in  a  chain  consist  of  data  base  keys  stored  in  the 
records  to  be  linked.  The  owner  record  of  a  set  contains 
links  to  the  first  and  last  member  records  in  the  set.  Each 
member  record  contains  a  link  to  the  next  record  in  the  set. 
If  a  set  is  defined  as  "PRIOR  PROCESSABLE"  in  the  schema, 
each  member  record  has  a  link  to  the  previous  member  of  the 
set.  A  member  record  defined  as  "LINKED  TO  OWNER"  in  the 
schema  will  contain  a  link  to  the  owner  record  of  the  set. 
The  link-to-next-record  in  the  last  record  of  a  set  and  the 
link-to-previous-record  in  the  first  record  of  a  set  both 
point  to  the  owner  record  of  the  set. 

5.  Indexed  Access. 

Sets  which  are  singular  or  dynamic  have  indices  as 
their  primary  access  method.  Additionally/  indices  are  used 
for  secondary  set  linkaae  to  implement  "SEARCH"  keys  defined 
in  the  schema  for  a  record  type.  An  index  consists  of  a 
list  of  data  base  keys  ordered  as  specified  in  an  "ORDER"  or 
"SEARCH"  clause.  When  record  selection  is  through  set 
membership  with  data  field  values  soecified  (see  Apoendix  A, 
Section  8.1. a)*  an  aoprooriate  index  will  be  used  to  gain 
access  to  the  record.  See  Section  III. I  below  for  a  further 
discussion  of  indices. 
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I.   The  Schema  Index  File. 

All  indices  created  in  the  data  base  are  stored  in  the 
schema  index  file.  The  name  of  this  file  is  the  schema  name 
prefixed  bv  "index. " .  Each  index  in  the  file  is  orqanized 
into  512  byte  blocks  each  of  which  has  the  format  shown  in 
Fig.  2 . 


st  rue t  i  bl  oc  k ( 

int  i  b 1  ink;         //  link  to  orevious  index  block 
int  iflink;         //  link  to  next  index  block 
char  ientryf508l;  //  up  to  127  index  entries 
> 


Format  of  an  Index  Block. 
F  i  qure  2 . 


When  the  backward  link  field  (iblink)  in  the  first  block 
of  an  index  is  zero*  the  index  is  not  in  use.  Minus  one 
indicates  that  it  is  in  use.  The  forward  link  field 
(iflink)  of  the  last  block  in  the  index  is  zero.  The  entry 
array  (ientry)  contains  uo  to  127,  four  byte  data  base  keys. 
Null  entries  at  the  end  of  an  index  block  are  all  zeroes. 
then  an  index  is  first  created,  seven  empty  entry  slots  are 
left  at  the  end  of  each  block  for  future  qrowth. 

An  index  is  searched  by  a  modified  form  of  binary 
search.  when  a  record  is  to  be  located  via  the  index,  the 
first  index  block  is  retrieved  and  the  records  cor respondi no 
to  the  firs*"  and  last  data  base  kevs  entered  in  the  block 
are  examined.   If  these  records  bracket  the  desired   record, 
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a  binary  search  is  conducted  throuqh  the  index  block  to  find 
the  desired  record.  If  the  desired  record  is  not  associated 
with  the  index  block/  subseauent  index  blocks  are  read  and 
the  last  record  for  each  block  is  examined  to  determine  if 
it  brackets  the  desired  record.  When  the  correct  block  is 
found*  a  binary  search  of  that  block  is  used  to  find  the 
record.  For  an  index  with  k  blocks  havinq  an  average  of  n 
entries  in  each  block/  t  h  *»  averaqe  number  of  records  exam- 
ined in  locatinq  a  record  is  aoprox i mat e 1 y  (k  /  2)  +  m  / 
where  m  is  the  1 oq  base  two  of  n. 

When  data  base  keys  are  added  to  the  last  block  of  an 
index*  a  new  last  index  block  is  created  whenever  six  or 
fewer  empty  slots  remain  in  the  block.  If  a  block  other 
than  the  last  block  overflows  while  a  data  base  key  is  being 
added/  a  new  block  is  inserted  into  the  sequence  of  blocks. 
The  last  seven  entries  of  the  old  block  are  copied  into  the 
new  block  and  the  new  data  base  key  entry  -is  added.  When- 
ever the  last  key  remaining  in  a  block  is  deleted/  the  block 
is  removed  from  the  index  and  freed. 

Indices  are  used  for  set  linkage  as  well  as  to  facili- 
tate the  maintenance  of  a  "MO  DUPLICATES"  clause  refering  to 
items  not  linked  in  another  way.  Indices  are  linked  to  the 
records  when  needed  by  storinq  the  startinq  block  number  of 
the  index  in  the  record. 

In  the  absence  of  P  and  V  ooerators  [Ref.  22] ,  an  index 
may   be   reserved  for  use  by  a  particular  copy  of  the  schema 
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DBM  in  the  following  tortuous  manner.  When  an  index  is  to 
be  accessed/  the  schema  DBM  attempts  to  create  a  file 
( " i ndexdum" )  in  the  schema  directory.  This  file  is  created 
with  a  mode  that  does  not  allow  writing*  therefore  should 
another  process  attempt  to  create  "indexdum"  while  it  is 
open/  the  attempt  will  fail.  If  a  create  fails*  it  is 
repeated  until  successful.  Once  the  "indexdum"  file  has 
been  created*  the  first  block  of  the  index  is  read  from  the 
index  file.  If  the  backward  link  field  of  the  first  block 
is  minus  one*  "indexdum"  is  closed  and  the  above  process  is 
repeated  until  the  backward  link  is  zero.  The  backward  link 
is  then  set  to  minus  one  and  the  block  written.  The  "index- 
dum" file  is  then  destroyed.  If  an  index  is  not  available* 
the  schema  DBM  releases  any  indices  allocated  in  order  to 
avo  i  d  dead  locks. 

When  an  index  is  to  be  released*  the  backward  link  of 
the  first  index  block  is  set  to  zero  and  the  block  is 
rewritten.  Thus*  only  one  schema  DBM  can  use  a  particular 
index  at  any  given  time.  This  svstem  avoids  integrity  prob- 
lems stemming  from  simultaneous  update  of  an  index  block  by 
two  different  copies  of  the  schema  DBM. 

During  the  course  of  day-to-day  operations*  indices  will 
be  created  as  well  as  discarded.  When  an  index  is  of  no 
further  use*  its  blocks  must  be  made  available  for  recylc- 
ing.  The  free  blocks  thus  created*  are  accounted  for  on  a 
free  block  list  similar  to  the  free  list  in  a  UNIX  super 
block.    Block   zero   of   the   schema   index  file  is  used  to 
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contain  the  free  list.  The  first  word  of  block  zero  con- 
tains the  block  number  of  a  free  list  block  or  zero  if  none 
exist.  The  remaining  255  words  are  used  to  store  the  block 
numbers  of  free  blocks. 

The  free  list  is  maintained  as  follows.  When  a  block 
must  be  added  to  the  free  list*  its  block  number  is  stored 
in  the  first  available  slot  in  the  free  block.  If  all  the 
slots  are  taken,  block  zero  is  coDied  into  the  free  block; 
the  block  number  of  this  free  block  is  stored  in  the  first 
word  of  block  zero?  and  the  remainder  of  block  zero  is 
filled  with  zeroes.  If  a  free  block  must  be  allocated  to  an 
index*  the  last  non-null  block  number  is  extracted  from 
block  zero  and  the  slot  is  cleared.  Tf  no  free  blocks  are 
on  the  list  and  word  zero  contains  a  block  number*  that 
block  is  allocated  to  the  index  after  first  copying  its  con- 
tents into  block  zero.  If  block  zero  is  all  zeroes*  a  new 
block  is  added  to  the  end  of  the  file  for  use  in  the  index. 
Other  schema  DB^  processes  are  locked  out  durina  block 
aauisition  and  freeing  by  the  same  mechanism  used  to  gain 
control  of  an  index. 

J .   Pr  i  vacy . 

The  CODASYL  DBMS  design  allows  for  orivacy  locks  to  be 
established  at  all  levels.  It  allows  separate  orivacy  locks 
for  each  function  on  a  resource.  Adai t i ona 1 1 y *  it  allows  a 
Drivacy  lock  to  be  either  a  strino.  to  be  matched  or  a  lock 
orocedure.    The   UNIX   implementation   features   all   these 
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options.   Their  implementation  is  accomplished  as  follows. 

when  the  files  in  the  schema  directory  are  created/  UNIX 
establishes  the  installation's  Data  Base  Administrator  as 
their  owner.  By  making  the  access  orivileges  of  a  file  read 
and  write  for  owner  only  (UNIX  function  "chmod"),  the  Data 
Base  Administrator  can  prohibit  all  other  users  in  the  sys- 
tem from  opening  the  file.  Thus,  only  the  Data  Base 
Administrator  (or  super  user)  can  directly  read  or  write  a 
schema  file. 

The  DBM  Request  Processor  mav  be  used  for  both  system 
maintenance  and  to  initiate  user  execution.  However,  since 
the  DBM  Request  Processor  does  not  set-user-idr  it  can  only 
perform  system  maintenance  functions  when  used  by  the  Data 
Base  Administrator. 

Since  the  schema  DBM  program  file  is  executable  by  any 
user,  the  DBM  Request  Processor  can  initiate  it  to  process 
user  requests.  Since  the  schema  DBM  does  a  set-user-id  to 
the  user  id  of  the  Data  Rase  Administrator,  it  can  access 
the  schema  files  as  required.  A  user  must  not  be  able  to 
penetrate  the  schema  DBM  to  gain  access  to  information  for 
which  he  does  not  have  the  orivacv  keys. 

The  schema  DBM  orevents  unauthorized  access  using  the 
following  procedure.  The  schema  DBM  has  a  privacy  f 1 aq  for 
every  function/resource  pair  for  which  a  privacy  lock  can  be 
defined.  When  the  schema  DRM  validates  the  user's  sub- 
schema (contained  in  the  initial  call   message)/   it   checks 
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the  privacy  keys  defined  in  the  sub-schema.  Each  privacy 
flag  for  which  no  lock  is  defined  or  for  which  the  sub- 
schema privacy  key  is  valid*  is  set  to  allow  access*  other- 
wise it  is  set  to  deny  access.  The  user  receives  no  immedi- 
ate indication  as  to  whether  or  not  his  privacy  keys  fit  the 
locks.  If  he  later  tries  to  access  some  data  base  resource 
in  a  way  for  which  he  did  not  furnish  acceptable  privacy 
keys*  his  request  fails.  Once  an  initial  response  message 
is  accepted  by  the  schema  DBM/  no  further  unlocking  of 
resources  can  be  done.  Thus*  in  order  to  access  the  denied 
resource*  the  user  proaram  must  be  terminated  and  restarted 
using  a  fresh  copy  of  the  schema  DBM  which  must  be  provided 
the  proper  privacy  keys.  Thus*  no  single  execution  of  a 
user  program  can*  through  trial  and  error*  determine  the 
valid  privacy  keys. 

K.   Integrity. 

As  previously  mentioned*  the  CODASYL  DDLC  envisioned 
that  the  DBMS  routines  would  be  contained  in  a  single  pro- 
cess which  would  service  all  users.  That  concept  guarantees 
the  integrity  of  the  data  base  since  simultaneous  update  of 
the  data  base  is  imoossible.  Additionally*  a  conceot  called 
Hkeeo"  status  is  included  in  the  COBOL  DML  [Ref.  33.  A 
record  has  automatic  "keeo"  status  for  a  run-unit  durina  the 
time  it  is  the  current  record  of  that  run-unit.  A  run-unit 
can  also  reauest  wkeeD"  status  for  a  record  if  it  desires  to 
be   informed   of   what  haooens  to  the  record.   If  a  run-unit 
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modifies  or  deletes  a  record  which  has  "keeo"  status  for 
another  run-unit*  the  run-unit  having  the  record  in  "keep" 
status  will  be  notified  of  the  action.  Although  the  "keep" 
mechanism  does  not  resolve  the  problem  of  concurrent  update* 
it  does  provide  a  mechanism  for  identifying  potential  prob- 
lems. "Keep"  status  allows  run-units  to  update  the  data 
base  while  still  allowing  access  to  it. 

Since  each  user  process  is  coupled  to  its  own  version  of 
the  schema  QBM*  none  of  the  above  features  can  be  readily 
implemented.  As  a  consequence*  if  multiple  schema  DBM  pro- 
grams concurrently  open  an  area  for  update*  data  base 
integrity  problems  are  virtually  assured.  If*  however*  all 
users  open  any  area  to  be  uDdated  for  protected  or  exclusive 
use  (see  Appendix  A,  Section  B.2.b)*  no  integrity  problems 
can  arise. 

An  area  opened  for  protected  use  cannot  be  ooened  by 
another  process  for  update.  An  area  opened  for  exclusive 
use  cannot  be  opened  at  all  by  another  process.  The  mechan- 
ism for  insuring  that  these  rules  are  enforced  is  the  Logi- 
cal Usage  Block  File.  This  file  resides  in  the  "tmp"  direc- 
tory and  has  the  same  name  as  the  schema  (hence  the  rule 
that  no  two  data  bases  may  have  same  name).  It  contains  the 
logical  usage  block  which  records  the  openinq  mode  of  every 
area  currently  open  bv  any  copy  of  the  schema  DRM.  When  a 
schema  DB^  desires  to  ooen  an  areaf  it  reads  the  logical 
usaae  block  and  determines  whether  or  not  a  conflict  exists 
between   the  ooenina  mode  it  desires  and  the  modes  in  use  bv 
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other  processes  havinq  the  area  ooen.  If  no  conflict 
exists,  it  opens  the  area  and  updates  the  logical  usage 
block  accordingly;  otherwise*  it  notifies  the  user  that  the 
open  has  f a  i 1 ed. 

In  order  to  avoid  problems  with  simultaneous  update  of 
the  logical  usage  block  bv  different  processes*  a  lock  out 
file  mechanism  similar  to  "indexdum"  is  employed.  This  file 
is  named  "ooendum"  and  resides  in  the  schema  directory.  The 
"opendum"  file  is  created  to  lock  out  other  processes  while 
the  logical  usage  block  is  being  accessed. 

L.   The  Schema  DBM. 

As  mentioned  earlier*  the  DBMS  compiler  must  produce   a 

schema   DBM   orogram   when  a  schema  is  comoiled.  The  schema 

DBM  program  is  comoosed  of  two  parts:  the  schema  constants 
and  the  DBM  skeleton. 

1.   Schema  Constants. 

The  DBMS  compiler  must  produce  a  C  coded  temporary 
file  which  contains  all  the  schema  uniaue  constants  necces- 
sary  to  tailor  the  DBM  skeleton  to  the  schema  being  com- 
piled. These  constants  cause  the  various  buffers  and  arrays 
used  by  the  DBM  proaram  to  be  allocated  sufficient  memory  to 
handle  the  schema.  For  a  descriotion  of  the  constants  and 
arrays  involved  see  Aopendix  H. 
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Additionally,  this  temporary  file  must  include  the 
initialization  for  the  arrays  "procpoint"  and  "procname". 
These  arrays  contain,  resoectively,  pointers  to  and  the 
names  of  all  the  data  base  procedures  in  the  schema.  When- 
ever a  data  base  procedure  name  is  encountered  during  the 
initialization  phase  of  the  schema  DRM,  "procname"  is 
searched  until  the  matching  name  is  found  and  a  pointer  to 
the  data  base  procedure  is  extracted  from  the  corresponding 
"orocooint"  entry. 

When  the  constant  file  has  been  generated,  the  DBMS 
compiler  can  use  the  C  compiler  to  form  the  schema  DRM  from 
the  constant  file  and  skeleton  DBM.  Since  pointers  to  the 
data  base  procedures  are  used  as  initializing  constants  in 
"orocpoint",  all  the  data  base  procedures  will  automatically 
be  loaded  into  the  output  object  module  after  compilation. 
Both  the  DBM  skeleton  and  the  data  base  procedures  must 
exist  as  object  modules  available  to  the  C  comoiler.  All 
the  external  arrays  which  are  dimensioned  in  the  constant 
file  are  declared  but  not  explicitly  dimensioned  in  the 
skeleton  DBM.  The  finished  product  of  the  C  compiler  is  an 
executable  schema  DRM. 

2.   The  DBM  Skeleton. 

The  DBM  skeleton  is  an  object  module  which  contains 
all  the  DBMS  routines  (exceot  data  base  procedures)  reauired 
to  provide  user  services  for  the  data  base.  Appendix  F  con- 
tains  a  complete  description  of  the  DRM  skeleton.   The  pro- 
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cessing  of  the  skeleton  (and  thus  the  schema  dbm)  is  divided 
into  two  phases:  initialization  and  user  reauest  processing. 

a.  Initialization  Phase. 

When  the  schema  DBM  is  called,  it  has  no  infor- 
mation about  the  organization  of  the  schema  or  sub-schema. 
Although  all  its  buffers  are  the  right  size  and  all  the 
necessary  data  base  procedure  are  compiled  into  it/  it  has 
no  knowledge  of  data  base  names*  privacy  locks*  set  rela- 
tionships or  anv  other  data  peculiar  to  the  schema.  In 
order  to  function,  it  must  read  in  all  the  data  in  the 
schema  description  file.  Concurrently,  it  processes  the 
user  program's  sub-schema  which  is  passed  in  the  user 
program's  initial  call  messaae.  By  validating  the  sub- 
schema while  initializing  the  schema,  the  schema  DPM  can 
immediately  translate  all  references  into  terms  of  its 
internal  index  numbers  rather  than  store  data  base  names. 
This  avoids  much  of  the  matching  overhead  for  each  user 
reauest.  If  the  sub-schema  fails  the  validation,  the  schema 
DBM  sends  the  user  program  an  error  message,  returns  to  the 
beginning  of  the  schema  description  file  and  restarts  ini- 
tialization. The  initialization  phase  will  thus  be  ter- 
minated only  if  the  user  proaram  either  submits  a  valid 
sub-schema  or  terminates. 

b.  User  "eauest  Servicing  Phase. 

After  a  successful  initialization,  the  user 
reauest   servicina   phase   begins.   This  ohase  consists  of  a 
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loop  which  reads  a  user  message*  processes  it*  and  returns  a 
response  to  the  user  program.  The  loop  runs  until  user  pro- 
gram termination,  at  which  point  the  schema  DBM  terminates. 

Processing  a  user  message  is  accomplished  by 
selecting  a  service  routine  based  on  the  messaoe  type.  One 
service  routine  exists  for  each  message  type  exceDt  the  ini- 
tial call#  with  an  additional  routine  to  Drocess  invalid 
message  types.  During  this  phase*  an  initial  call  is  con- 
sidered an  invalid  message.  Each  service  routine  uses  one 
or  more  utility  routines.  Utility  routines  are  general  pur- 
pose data  base  access  and  maintenance  primitives  which  may 
be  used  by  several  service  routines. 
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TV.   DESIGN  OF  THE  C  DDL  AND  DML. 

A.   Design  Goals  and  Decisions. 

The  augmentation  to  the  C  1 anquage  was  designed  to  pro- 
vide a  natural  interface  between  the  C  1 anguaae  and  the  DBMS 
without  reducing  its  ability  to  supoort  a  COBOL  DDL  and  DML. 
Accordingly,  the  C  DML  was  desiqned  to  have  as  much  func- 
tional similarity  to  the  COBOL  DML  as  was  feasible.  This 
goal  was  adopted  to  support  the  research  objective  of  test- 
ing CODASYL's  contention  that  the  DBMS  could  support  a 
variety  of  sub-schema  DDL  * s  and  DML's.  See  Section  C  below 
for  a  comparison  of  the  COBOL  and  C  DDL  and  D^L. 

One  of  the  desirable  aoals  of  a  DBMS  is  to  provide  pro- 
gram independence  from  the  definition  of  the  data  base. 
Additionally,  one  o  *  the  primary  desian  Dhilosophies  of  C 
was  economy  of  expression.  In  order  to  facilitate  both  of 
these  goals,  the  C  DDL  orovides  for  describing  only  a 
minimal  subset  of  the  relationships  and  restrictions  which 
aopear  within  a  schema  description.  The  DDL  is  restricted 
to  describing  the  names  and  orivacv  keys  for  areas,  records, 
data  items,  data  agaregates  and  sets;  and  the  membershio  of 
record  types  in  set  types.  Data  tyoes  of  items  and  aggre- 
gates must  be  specified  as  well,  but  this  information  may  be 
different  from  the  data  types  recorded  in  the  schema 
description.   Additionally,  the  DDL  need  only  describe  those 
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portions  of  the  schema  the  program  is  interested  in  m  a  n  i  o  u  - 
lating.  Since  this  information  is  the  only  data  absolutely 
necessary  to  the  DML,  unless  major  changes  affecting  the 
validity  of  the  program  logic  occur  in  the  schema*  a  recom- 
pilation  of  the  program  should  seldom  be  needed.  The  pro- 
grammer obviously  needs  to  know  a  lot  more  about  the  schema, 
however  he  can  (and  should)  obtain  this  information  from  an 
installation  Data  Element  Dictionary  [Ref.  191. 

A  fourth  aoal  was  to  integrate  the  C  DDL  and  DML  into 
the  host  languaae's  structure  whenever  possible.  Accord- 
ingly, the  DDL  was  arouped  into  a  soecial  external  function 
and  the  DML  functions  have  been  given  formats  similar  to 
other  C  special  functions  such  as  "return".  DML  loaical 
expressions  are  compatible  with  normal  C  expressions. 

B.   Major  ConceDts. 

This  section  describes  some  of  the  concepts  essential  to 
the  implementation  and  use  of  the  C  DDL  and  DML.  For  a  com- 
plete definition  of  the  C  DDL  and  DML,  see  Apoendix  A. 

1 .   Currency  . 

The  concept  of  currency  is  central  to  the  navigating 
of  access  oaths  in  the  DML.  The  user  process  as  well  as 
each  record,  set  and  area  tvoe  known  to  it  have  a  current 
record  associated  with  them.  This  currency  is  established 
by  the  execution  of  a  "find".  When  a  record  is  found,  it 
becomes   the  current  record  of  the  process,  its  record  tyoe, 
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the  area  in  which  it  resides  and  the  set  tyoe  of  every  set 
occurence  the  record  participates  in  as  a  member  or  owner. 
The  current  set  occurence  of  each  set  type  is  the  set  in 
which  the  current  record  of  that  set  type  participates. 

Information  about  a  record*  including  data  values  if 
the  record  was  fetched  by  "aet"»  continues  to  be  available 
until  the  record  is  replaced  as  the  current  record  every- 
where its  currency  was  originally  established.  For  example* 
if  a  record  is  the  current  record  of  a  particular  area,  it 
will  remain  available  as  the  current  record  of  that  area 
until  a  "find"  is  executed  which  selects  a  different  record 
residing  in  the  same  area. 

2 .   Find  versus  Get. 

The  difference  between  the  "find"  and  "get"  func- 
tions is  that  the  former  locates  records  in  the  data  base 
while  the  latter  extracts  data  values  from  the  data  base. 
When  a  "find"  is  executed*  the  DBMS  spans  the  access  oaths 
specified  in  the  record  selection  criterion  and  returns  all 
the  information  about  a  record  necessary  to  make  it  current 
in  the  appropriate  olaces.  For  a  description  of  the  record 
selection  options  available*  see  Aopendix  A*  Section  B.l.b. 
The  information  necessary  to  establish  currency  includes  the 
selected  record's  data  base  key*  record  type  and  the  set 
tvpes  for  the  sets  it  participates  in  as  a  member  or  owner. 
The  area  the  record  resides  in  can  be  derived  from  its  data 
base  kev . 
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A  "get"  is  used  to  access  the  data  values  associated 
with  a  record.  The  record  must  have  been  made  the  current 
record  of  the  process  prior  to  executing  the  "get".  The 
values  of  the  record's  data  items  are  available  via  pointers 
associated  with  each  entity  for  which  the  record  is  current. 
Whether  an  imdementor  elects  to  provide  separate  buffers 
for  each  currency  type  or  merely  reassign  the  value  of 
pointers  is  immaterial. 

3.   Independence  of  Schema  and  Sub-schema. 

A  user  program  can  be  compiled  without  reference  to 
the  schema  descriDtion.  A  program  usina  a  sub-schema  will 
continue  to  execute  until  the  data  base  name,  privacy  locks 
or  set  memberships  described  in  the  sub-schema  are  changed 
or  deleted  in  the  schema.  However,  changing  entries  such  as 
record  location  modes  and  set  selection  clauses  in  the 
schema  may  alter  the  loaic  of  a  program. 

The  data  tyoes  of  the  sub-schema  may  differ  from 
those  of  the  schema.  The  D8MS  will  automatically  convert 
data  to  the  types  desired  by  the  sub-schema  before  delivery. 
Some  type  differences*  however,  may  cause  an  error  if  the 
data  involved  is  i ncompat ab 1 e .  For  examole,  converting  a 
string  of  characters  to  an  integer  will  fail  if  the  string 
contains  any  non-numeric  characters. 
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C.   Comparison  with  the  COBOL  DDL  and  DML. 

The  comoarision  will  be  made  by  means  of  an  illustrative 
example.  For  a  detailed  description  of  the  C  and  COBOL  DDL 
and  DML  see  Aopendix  A  and  Ref.  3  respectively.  The  COBOL 
portion  of  the  example  is  patterned  after"  Ref.  20.  Several 
modifications  were  made  to  reflect  both  recent  changes  in 
the  definition  of  the  CODASYL  DBMS  and  UNIX  implementation 
data  types. 

The  example  concerns  the  representation  of  a  personnel 
data  base.  Figure  3  is  a  diagram  of  the  network  and  records 
involved.  In  Fig.  3,  the  rectangles  represent  records  and 
the  arrows  Doint  from  set  owners  to  set  members.  Record 
names  are  written  above  rectangles?  item  values  within.  Set 
names  are  superimposed  over  the  set  linkage  arrows.  Multi- 
ple agents  may  be  linked  to  an  assianment  and  multiple 
assignments  to  an  aaent.  The  fact  that  a  particular  agent 
is  assigned  to  a  particular  assignment  is  represented  by  the 
existence  of  a  LINK  record  which  has  membership  in  both  the 
AGENT-LINK  set  owned  by  that  agent's  AGENT  record  and  the 
ASSIGNMENT-LINK  set  owned  by  that  assianment's  ASSIGNMENT 
record.  The  link  records  are  made  necessary  by  the  restric- 
tion that  a  record  may  only  be  a  member  of  one  set  of  a 
gi  ven  t  ype  . 

Figures  4  and  5  show  the  DDL  description  of  the  schema. 
The  COBOL  sub-schema  could  be  essentially  an  exact  cony  of 
the  schema.   The  C  sub-schema  is  shown  in  Fig.  fe.   Note  that 
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an  agent's  number  is  represented  as  a  numeric  character 
string  in  the  schema  and  as  an  integer  in  the  C  sub-schema. 
In  the  C  sub-schema,  all  the  data  base  names  are  spelled 
with  lower  case  letters  and  with  hyphen  ("-" )  replaced  by 
underscore  ("«-").  The  DBMS  translates  identifiers  to  allow 
for  this  difference  in  spelling  conventions.  In  the  UNIX 
implementation,  data  base  names  spelled  with  any  combina- 
tions of  upper  and  lower  case  letters  will  be  prooerly 
recogn  i  zed. 

Department 


!  Function:  espionage  ! 
!      Head:  M  ! 


department -aaent 


Agent 


Ass  i  gnment 

!   Name:  Goldfinger  ! 


J     Name:  James  Bond  ! 
J   Number:  007  ! 


!   Name:  Thunderball  ! 


agent- 


aaent -skill 


skill 


!  Name:  spy  ! 
ILevel:  1    ! 


..A 

J  Name: 


1  over ! 
10    ! 


Network  Representation  of  a  Data  Base 
F  i  gure  3 . 
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SCHEMA  NAME  IS  PERSONNEL-FILE. 

AREA  NAME  IS  DEPARTMENT-AREA. 
AREA  NAME  IS  ASSIGNMENT-AREA. 

RECORD  NAME  IS  DEPARTMENT. 

LOCATION  MODE  IS  CALC  USING  FUNCTION 

DUPLICATES  ARE  MOT  ALLOWED; 
WITHIN  DEPARTMENT-AREA. 
01       FUNCTION;  PICTURE  IS  HAC20)H . 
01       HEAD;  PICTURE  IS  "A(l)". 
RECORD  NAME  IS  AGENT; 

LOCATION  MODE  IS  CALC  USING  NUMBER; 

DUPLICATES  ARE  NOT  ALLOWED; 
WITHIN  DEPARTMENT-AREA. 
01       FIRST-NAME;  PICTURE  " A  ( 1  0 ) H  . 
01       LAST-NAME;  PICTURE  "ACIO)". 
01       NUMBER;  PICTURE  "9(3)". 
RECORD  NAME  IS  SKILL; 

LOCATION  MODE  IS  VIA  AGENT-SKILL  SET; 
WITHIN  AREA  OF  OWNER. 
01       NAME;  PICTURE  IS  WA(20)". 

oi     levfl;  type  is  fixed  decimal. 

RECORD  NAME  IS  ASSIGNMENT; 

LOCATION  MODE  IS  CALC  USING  NAME  OF  ASSIGNMENT; 

WITHIN  AREA  OF  OWNER. 
01       NAME;  PICTURE  IS  "A(20)". 
RECORD  NAME  IS  LINK; 

LOCATION  MODE  IS  VIA  AGENT-LINK  SET; 

WITHIN  AREA  OF  OWNE^. 


Schema  DDL  Area  and  Record  Entries 
Fi  qure  4 . 
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SET 


SET  NAME  IS  DEPARTMENT-AGENT; 
OWNER  IS  DEPARTMENT; 
ORDER  IS  PERMANENT  INSERTION  IS 

SORTED  BY  DEFINED  KEYS; 
MEMBER  IS  AGENT  MANDATORY 

AUTOMATIC  LINKED  TO  OWNER; 
KEY  IS  ASCENDING  NUMBER; 

SET  SELECTION  IS  THRU  DEPARTMENT-AGENT  OWNER 
IDENTIFIED  BY  CURRENT  OF  SET. 

me  is  agent-skill; 

NER  IS  AGENT; 

DER  IS  PERMANENT  INSERTION  IS  SORTED  BY  DEFINED  KEYS; 
MBER  IS  SKILL  MANDATORY  AUTOMATIC- 
KEY  IS  DESCENDING  LEVEL; 

SET  SELECTION  IS  THRU  AGENT  SKILL  OWNER 
IDENTIFIED  BY  CURRENT  OF  SET. 
ME  IS  AGENT-LINK; 
NER  IS  AGENT; 

DER  IS  PERMANENT  INSERTION  IS  IMMATERIAL; 
MBER  IS  LINK  MANDATORY  AUTOMATIC  LINKED  TO  OWNER,* 
T  SELECTION  IS  THRU  AGEMT-LINK  OWNER  IDENTIFIED 
BY  CALC-KEY  EQUAL  TO  CURRFNT-AGENT . 
NAME  IS  ASSIGNMENT-LINK; 
OWNER  IS  ASSIGNMENT; 

ORDER  IS  PERMANENT  INSERTION  IS  IMMATERIAL; 
MEMBER  IS  LINK  MANDATORY  AUTOMATIC  LINKED  TO  OWNER; 
SET  SELECTION  IS  THRU  ASSIGNMENT-LINK  OWNER 
IDENTIFIED  BY  CALC-KEY  EQUAL  TO 
CURRENT-ASSIGNMENT; 


NA 
OW 
OR 

ME 


SET  NA 
OW 
OR 

ME 
SE 


SET 


Schema  DDL  Set  Entries. 
F i  aure  5 . 
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ddl  * 


schema  Dersonne  1  **f  i  1  e? 
area  deoart  ment«-area; 
area  ass  i  gnment  «-area; 
record  department 

char  f unc t  i  on  (20]  ; 

char  head  Cll  ; 

> 
record  agent { 

char  f  i  rst«-name  tl  0]  ; 

Char  1  ast*-name  (101; 

i  nt  number; 

> 
record  ski  1 1  * 

char  name [20]  ; 

i  nt  1 evel 7 

> 
record  assignment* 

char  name  [20]  ; 

} 
record  link*} 
} 
set  depart  ment«-agent  owner  is  department* 

member  agent ; 

> 
set  aqent«-ski11  owner  is  agent* 

member  skill? 

} 
set  aaent«-link  owner  is  agent* 

member  link; 

} 
set  ass  i  gnment  «-l  i  nk  owner  is  assiqnment* 

member  link; 


> 


> 


C  Sub-schema  Entries. 
Fioure  6 . 
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1 .   Query  1  . 

The  first  query  is  desioned  to  extract  the  skills  of 
agent  007.  The  procedure  is  to  initialize  the  agent  number* 
FIND  the  agent  and  orint  the  agent  number*  skill  name  and 
skill  level  for  each  skill  the  aqent  has  (if  any).  The 
COBOL  and  C  realizations  of  Query  1  are  shown  in  Fig.  7  and 
8  resoec t i vel y .  In  both  versions  access  to  the  aqent  is  via 
the  CALC  key  in  the  AGENT  record  and  the  appropriate  AGENT- 
SKILL  set  is  aut omat i ca 1 1 v  selected  when  the  agent  007 
becomes  current.  The  aaent  record  need  not  be  fetched  since 
none  of  its  data  fields  are    needed. 


OPEN  ALL. 
FIND-AGENT-RECORD. 

MOVE  •007'  TO  NUMBER  OF  AGENT. 

FIND  AGENT  RECORD. 
READ-FIRST-SKILL. 

FIND  FIRST  SKILL  RECORD  OF  AGENT-SKILL  SET. 

IF  ERROR-STATUS  =  0326  GO  TO  ALL-DONE. 
PRINT-SKILL. 

GET. 

DISPLAY  'AGENT  =  *  ,  NUMBER  OF  AGENT,  '  ,  SKILL  =  ', 
NAME  OF  SKILL,  ',  LEVEL  =  ',  LEVEL  OF  SKILL. 
READ-NEXT-SKILL. 

FIND  NEXT  SKILL  RECORD  OF  AGENT-SKILL  SET. 

IF  ERROR-STATUS  =  0307  GO  TO  ALL-DONE, 

ELSE  GO  TO  PRINT-SKILL. 
ALL-DONE. 


Query  1  Coded  in  COBOL 
F  i  gure  7 . 
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dbooen ( ) ; 
agent .number  =  7; 
f  i  nd(aaeot ) ; 

f or ( f ind( f i rst  skill  of  aqent«-sk  i  1  1  )  ;  i  error  .st  at  us  ;  )  < 
qet  (  ); 
orintf (MAgent  =  %s,  Skill  =  %s,  Level  =  %s\n", 

agent. number, ski  1  1  .name, ski  1 1  .level  ); 
findCnext  skill  of  agen  t  *-sk  i  1  1  )  ; 
} 


Query  1  Coded  in  C. 
F  i  oure  8 . 


2.   Query  2 . 

The  second  examole  Query  is  designed  to  find  all 
department  heads  concerned  with  the  assignment  "THUNDER- 
BALL".  The  procedure  is  as  follows.  FIND  the  ASSIGNMENT 
record  whose  NAME  is  " THUMDERBALL" .  For  each  LINK  record  in 
the  assignment's  ASSIGNMENT-LINK  set, 

.  find  the  link's  owner  in  the  AGENT-LINK  set  it  belongs 
to, 

.  find  that  AGENT  record's  owner  in  the  DEPARTMENT-AGENT 
set  it  belongs  to  and 

.  print  the  assignment  name  and  department  head. 

The  COBOL  and  C  realizations  of  Query  2  are  shown  in 
Fig.  9  and  10  respectively.  In  both  versions,  once  the 
desired  link  is  made  current,  the  appropriate  DEPARTMENT 
record  can  be  reached  via  the  AGENT-LINK  and  DEPARTMENT- 
AGENT  set.  Note  that  only  the  DEPARTMENT  records  need  to  be 
fetched  since  all  the  soannina  of  access  oaths  is  accom- 
plished throunh  currency  information. 
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OPEN  ALL. 
FIND-ASSIGNMENT-RECORD. 

MOVE  'THUNDERBALL'  TO  NAME  OF  ASSIGNMFNT. 

FIND  ASSIGNMENT  RECORD. 
FIND-FIRST-LINK. 

FIND  FIRST  LINK  RECORD  OF  ASSIGNMENT-LINK  SET. 

IF  ERROR-STATUS  =  0326  GO  TO  ALL-DONE. 
FIND- A GENT -LINK -OWNER. 

FIND  OWNER  RECORD  OF  AGENT-LINK  SET. 
RE AD-DEPARTMENT -RECORD. 

FIND  OWNER  RECORD  OF  DEPARTMENT-AGENT  SET. 

GET. 

PRINT-DEPARTMENT-HEAD. 

DISPLAY  'ASSIGNMENT  =  ',  NAME  OF  ASSIGNMENT, 
•,  HEAD  =  ',  HEAD  OF  DEPARTMENT. 
FIND-NEXT-LINK. 

FIND  NEXT  LINK  RECORD  OF  ASSIGNMENT  LINK  SET. 
IF  ERROR-STATUS  =  0307  GO  TO  ALL-DONE 
ELSE  GO  TO  FIND-AGENT-LINK-OWNER. 
ALL-DONE. 


Query  2  Coded  in  COBOL. 
F  i  aure  9 . 


dbooen ( ) ; 

for(i=0;i<12?i++)assianment .name [i J  =" THUNDERS ALL" H J ; 

f  i  nd (ass  i  anment  )  ; 

for(find(first  link  of  assignment*-!  ink)  J  lerror.status?)  { 

find(owner  of  aaent*-l  ink)  ; 

find(owner  of  depart  ment«-aqent  )  ; 

get  (); 

printfC" Assignment  =  %  s  r  Head  =  %  s  \  n  "  , 
assianment .namerdepartment .head) ; 

find(next  link  of  ass  i  qnment  *-l  i  nk  )  ; 

} 


Query  2  Coded  in  C . 
F  i  aure  10. 
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V.   CONCLUSIONS  AMD  RECOMENDAT  IONS  . 


A.   Conclusions. 

Since  the  DBMS  compiler  and  the  C  language  augmentation 
are  not  yet  implemented,  it  is  difficult  to  fully  evaluate 
the  effectiveness  and  efficiency  of  the  DBMS.  In  ■general* 
it  can  be  said  that  the  UNIX  file  system  seems  to  be  a  very 
hospitable  environment  for  developing  a  DBMS*  however  the 
operating  system  facilities  of  UNIX  are  not  nearly  as  well 
suited  to  supoortina  this  development.  The  DBMS  is  measured 
against  some  of  the  goals  of  DBMS  as  they  are  presented  in 
Sect  ion  1 1  •  A .4 . 

1.  Concurrent  Retrieval  and  UDdate. 

The  DBMS  cannot  provide  the  ability  to  perform  con- 
current UDdate  of  the  same  area  by  two  users.  Although  the 
ability  to  open  an  area  for  unprotected  update  exists,  its 
use  can  be  disasterous.  Concurrency  between  update  and 
retrieval  in  an  area  causes  no  intregrity  problems?  however* 
the  user  doing  retrievals  has  no  way  of  knowing  if  the 
records  he  is  accessing  are    being  modified. 

2.  A  Variety  of  Search  Srrateaies. 

The  DBMS  suooorts  every  form  of  access  path  soeci- 
fied   by   the  CODASYL  DDLC.   These  forms  are    direct*  hashed, 
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sequential  and  indexed. 

3.   Centralized  Placement  Control. 

Placement  control  by  the  DBMS  is  a  ourely  loaical 
mapoing  with  the  UNIX  file  system  providing  centralized 
placement  control  for  the  data  onto  physical  media. 

4 •   Device  Independence. 

Device  indeoendence  is  almost  total  for  any  file  in 
UNIX.  The  DBMS  (and  therefore  the  user  orogram)  is  unaware 
of  either  the  types  or  number  of  devices  in  the  system. 

5  .   Pr  i  vacy  of  Dat  a • 

The  complete  orivacv  mechanism  in  the  CDDASYL  design 
has  been  implemented.  The  DBMS  itself  should  be  relatively 
secure.  A  program  could  be  written  to  call  the  schema  DBM 
repeatedly  and  determine  a  privacy  key  by  trial  and  error, 
but  using  data  base  procedure  privacy  locks  which  notify  a 
security  console  or  terminate  the  proaram  when  a  violation 
occurs  can  greatly  reduce  the  effectiveness  of  trial  and 
error  "lock  picking". 

UNIX  itself/  however,  is  too  easily  penetrated  fRef. 
21].  Locating  and  pluggina  all  the  holes  in  UNIX  may  be 
i  mposs  i  b 1 e . 
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6.   Independence  of  Schema  and  Sub-schema. 

The  DBMS  orovides  the  maximum  amount  of  independence 
possible  under  the  CODASYL  desian.  In  fact,  user  proarams 
could  be  comDiled   without   any   reference   at   all   to   the 

schema  • 

B.   Recomendat i ons . 

1.   Enhancement  for  Concurrent  Update. 

In  order  to  enhance  the  ability  for  concurrent  use 
of  a  data  baser  the  followinq  aooroaches  miqht  be  taken. 

a.  Centralized  Schema  DB^. 

UNIX  could  be  modified  to  provide  a  mechanism 
for  establishing  interprocess  communication  to  any  desig- 
nated process.  This  would  enable  implementation  of  a  cen- 
tralized schema  DBM  as  the  CODASYL  DDLC  intended.  This 
alternative  remains  imoractical  for  the  reasons  Section 
III.c*  i.e.f  the  schema  DBM  would  run  out  of  file  resources. 
Addi t i ona 1 1 y t  the  UNIX  modification  would  have  an  unknown 
but  probably  major  impact  on  the  operating  system's  design. 

b.  System  P  and  V  Cal 1 . 

UNIX  could  be  modified  to  provide  a  system  call 
for  P  and  V  operators.  For  a  discussion  of  P  and  V  opera- 
tors see  Ref.  22.  If  a  fast  P  and  V  facility  were  avail- 
able*  a   schema   DBM   could   temporarily  halt  all  update  or 
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access  to  an  area  while  oerforminq  modifications.  "Keep" 
status  could  be  implemented/  if  desired/  by  storinq  indica- 
tors in  the  record  itself.  The  impact  of  such  system  calls 
on  UNIX's  design  philosophy  is  expected  to  be  minimal. 

Additionally/  existinq  communications  between 
schema  DBM  programs  could  be  soeeded  uo.  Specifically/  the 
methods  used  with  "ooendum"  and  "indexdum"  to  lock  out 
simultaneous  undate  are  essentially  a  "test  and  set"  opera- 
tion which  could  be  implemented  more  efficiently  with  P  and 
V  system  cal Is. 

2.  Enhancement  for  Faster  Access. 

In  the  absence  of  usaae  data/  it  is  difficult  to 
estimate  the  access  response  soeed  of  the  DBMS.  However/  a 
logical  extension  to  the  access  methods  provided  by  the  DBMS 
would  be  multilevel  indices.  The  index  structure  now  in  the 
DBMS  is  essentially  an  index  seauential  access  scheme  which 
could  be  uoqradea  to  the  multilevel  structure  which  is  typi- 
cal of  such  indices.  For  a  disscusion  of  multilevel  index 
sequential  access  method/  see  Ref.  13.  Use  of  a  two  leveled 
index  would  divide  the  averaqe  number  of  records  scanned  to 
find  the  riqht  index  block  by  the  averaqe  number  of  entries 
in  the  index  blocks. 

3.  Automatic  Garbaqe  Collection. 

Since  only  the  Data  Administrator  can  initiate  gar- 
baqe collection/  the  wasted  soace  arowth  rate  in  a  data  base 
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may  become  a  problem.  Some  consideration  should  be  give  to 
having  the  schema  DBM  automatically  garbage  collect  when  the 
waste  in  an  area  reaches  a  critical  level.  This  thesis  did 
not  address  the  problem  of  automatic  garbage  collection  due 
to  the  difficulty  of  determining  what  amount  of  wasted  space 
is  critical. 
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APPENDIX  A.   C  LANGUAGE  DDL  AND  DML . 


A.   C  Language  DDL. 

The  DDL  in  C  is  designed  to  interface  the  subschema 
description  with  the  schema  description  with  a  minimal 
reguirment  for  path  information  from  the  user  and  maximal 
similarity  with  existing  C  language  constructs.  In  the  fol- 
lowing discussion,  words  enclosed  in  apostrophes  denote 
variable  data.  when  a  'lock'  is  specified,  the  data  item 
must  be  of  tyoe  "character  pointer".  All  ' db  identifiers' 
specified  must  match  the  appronriate  data  base  names  in  the 
schema  after  translation  of  lower  case  into  upper  case  and 
underscore  into  dash.  All  DDL  statements  are  enclosed  in  a 
"ddl "  routine  with  the  following  format: 

"ddM...ddl  statements...)". 
The  ddl  routine  should  apoear  prior  to  any   DML   statements. 
It   may   be   contained  in  a  file  INCLUDE'd  at  an  appropriate 
point  in  the  orogram.   The  statements  in  order  of  appearance 
are    as  foil ows . 

1 •   Schema  Entry. 

The  "schema"  statement  identifies  the   schema   name 
and  its  privacy  lock.   Its  format  is 

"schema  'db  identifier'  with  lock  'lock';". 

The  'db  identifier'  must  match  the  schema  name  and  the   lock 
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must   match   the  on'vacy  lock  for  the  schema  entry  (see  Ref. 
2r  sect  ion  3.1.0). 

B.   Area  Ent  ri  es . 

For  each  area  to  be  used,  an  area  entry  must  be  made. 
These  entries  must  be  in  the  same  order  as  the  area  entries 
in  the  schema.   The  format  of  an  area  entry  is 

"area  'db  identifier  list'  lock  is  'lock  list';", 
where  a  '  d  b  identifer  list*  is  one  or  more   comma   seoarated 
'  db   identifier's.    A   lock   list   is   one   or   more   comma 
separated  lock  entries  of  the  form 

"'lock'  for  'modifier'  'function'", 
where  the  modifier  is  ODtional.  For  an  area  entry,  the 
allowable  modifiers  are  "exclusive"  and  "protected"  and  the 
allowable  fuctions  are  "uodate"  and  "retrieval".  The  ' db 
identifier's  and  'lock's  must  match  the  area  names  and 
privacy  locks  in  the  schema  area  entries  (see  Ref.  2,  sec- 
tion 3.2.0)  . 

1  .   Record  Entries. 

Record  entries  must  be  made  in  the  same  order  as  the 
corresponding  record  entries  in  the  schema.  A  record  entry 
is  similar  in  construction  to  a  C  language  structure  defini- 
tion.  Its  format  is 


re 


cord  'db  identifier'  lock  is  'lock  list'  {'item  list') 


where  the  "lock  is  'lock   list'"   ohrase   is   optional.   The 
Mock  list'  for  records  has  no  modifiers  defined.   The  func- 
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tions  allowed  are  "insert",  "remove",  "store",  "delete", 
"modify"  and  "find".  The  ' db  identifier'  and  Mock's  must 
correspond  to  those  of  the  schema  record  entry  (see  Ref.  2, 
sect  ion  4.2.3). 

The  item  list  in  a  record  entry   is   composed   of   a 
series  of  item  entries  of  the  following  form: 

"'type  specfier'  ' db  identifier'  ['constant  expres- 
sion'] lock  is  'lock  list*;", 
where  the  "['constant  express i on ' 1 "  and  "lock  is  'lock 
list'"  phrases  are  optional.  A  type  specifier  is  of  the 
form  "int",  "char",  "float",  "double",  "dbkey"  or 
"st ruct { ' i tem  list'}".  These  data  types  are  identical  to 
those  of  the  C  language  with  the  addition  of  "dbkey".  An 
item  of  tyne  "dbkey"  aDDears  to  be  an  array  of  four  charac- 
ters to  the  C  user.  The  'lock  list'  for  items  has  no  modif- 
iers defined.  The  oermissible  functions  are  "store",  "get" 
and  "modify".  The  item  entries  may  appear  in  any  order  in 
the  item  entry  list  with  the  following  restrictions.  Items 
must  appear  with  the  same  records  as  in  the  schema.  The 
data  tvoe  of  the  item  must  be  comoatable  with  the  schema 
item.  Items  of  type  "struct"  must  corresoona  to  repeating 
grouDS  in  the  schema  and  have  the  same  dimensionality  as  in 
the  schema.  Items  aooearinq  in  a  repeating  group  in  the 
schema  must  aooear  in  the  item  list  of  the  structure 
corresponding  to  that  repeating  arouc  Any  item  in  the 
schema  record  description  may  be  omitted. 
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The  record  names  can  be  used  in  non-OML  statements 
as  structures  whose  format  is  identical  to  the  record  entry. 
These  record  structures  are  global  names  and  contain  the 
current  record  of  the  respective  tyoe. 

2.  Set  Entries. 

For  each  set  to  be  referenced*  a  set  entry  of  the 
following  format  must  exist: 

"set  ' db  identifier*  lock  is  'lock  list'  owner  is 
• db  identifier*  * ' i den t i f i er *  {'member  list'};"* 
where  the  "lock  is  'lock  list'"  ohrase  is  optional.  The 
'lock  list'  for  sets  has  no  modifiers.  The  defined  func- 
tions are  "insert"*  "remove"  and  "find".  The  set  name  'db 
identifier'  and  'lock's  must  match  those  of  corresponding 
set  entry  in  the  schema  description  and  all  set  entries  must 
be  in  the  same  order  as  in  the  schema  description.  The 
second  ' db  identifier'  must  match  the  owner  name  of  the  set. 
The  member  list  is  composed  of  one  or  more  member  entries. 

3.  Member  Entries. 

A  member  entry  has  the  form 

"member  ' db  identifier'  * ' i dent i f i er ' 
1 ock  is  ' 1 ock  list';"* 
where  the  "lock  is  'lock  list'"  phrase  is  optional.  The 
'lock  list'  for  members  has  no  modifiers  defined  and  the 
defined  functions  are  "insert"*  "remove"  and  "find".  The 
' db  identifier'  must  be  the  name  of  a  record  defined  in  the 
schema  as  a  member  record  for  the  set  being  described. 
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The  * ' i dent i f i er ' s  named  in  the  set  and  member 
entries  become  global  Dointers  to  the  appropriate  record 
structure.  These  pointers  can  be  used  to  reference  the 
current  owner  record  and  member  record,  respec t i ve 1 y r  of  the 
set.  In  addition  the  set  name  '  db  identifier*  is  the  name 
of  a  character  arrav  which  holds  the  current  record  of  the 
set . 


C.   C  Lanquaqe  OML . 

The  DML  has  several  global  names  and  functions  associ- 
ated with  it.  Besides  the  record,  item  and  set  names  from 
the  ddl  routine,  there  are  the  pointers  "areaname",  "rec- 
name"  and  the  structure  "error".  The  "areaname"  pointer 
contains  the  address  of  the  area  array  containing  the 
current  record  of  the  process.  The  "recname"  pointer  con- 
tains the  address  of  the  record  structure  for  the  current 
record  of  the  process.  Note  the  "recname"  provides  the  user 
with  the  record  tvpe  of  the  current  record  of  the  process? 
but  the  current  record  of  the  orocess  may  not  be  the  current 
record  of  that  tyoe  and  therefore  the  record  pointed  to  may 
not  be  the  current  record  of  the  orocess.  The  current 
record  of  the  process  will  always  be  available  in  the  area 
array  oointed  to  bv  "areaname".  "Areaname"  and  "recname" 
are  set  whenever  a  find  or  store  function  is  executed.    The 
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"error"  vector  is  a  structure  with  the  following  format: 


struct  ( 

i  nt  status? 
int  type; 
char  *set ; 
char  *record; 
char  *area  J 
i  nt  count ; 
}  error; 


//  pointer  to  set  array  for  error 

//  oointer  to  record  for  error 

//  oointer  to  area    array  for  error 


The  error  codes  for  C  DML  functions  are  designed  to  be  com- 
patible with  error  codes  defined  in  Ref.  3  for  the  COBOL 
DML.  Addi t i onal 1 y ,  a  pointer  called  "areaid"  and  an  array 
of  four  characters  called  "keyname"  exist  for  the  store 
funct i  on . 


The  use  of  the  DML  causes  certain  identifiers  to  be  gen- 
erated  globally*   hence  these  should  be  treated  as  reserved 
words  by  the  user.   These  reserved  words  are'. 
.  all  arear     record    and  set  names 

.  permit 
.  store 
.  member 


.  areaname 
.  recname 
.  error 


.  dbooen 

.  dbc 1 ose 

.  find 

.  modi  f y 


.  current 

.  area  i  d 

.  oet 

.  key 


.  remove 

.  emDty 

.  owner 

.  duol i  cat e 

.  keyname 

.  i  nsert 
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1.   DML  Expressions. 

The  DML  introduces  two  additional  expression  tyoes 
into  C.  These  are  DML  loqical  expressions  and  DML  record 
selection  expressions. 

a.  DML  Logical  Expressions. 

These  expressions  evaluate  to  a  true/false  value 
and  can  be  used  in  a  manner  identical  to  normal  C  loaical 
expressions.   Their  forms  are 

(1)  " * db  identifier'  emDty",  where  ' db  identif- 
ier' must  aopear  as  a  set  name  in  a  set  entry.  The  expres- 
sion evaluates  true  if  and  only  if  the  current  set  of  the 
tyDe  specified  has  no  members. 

(2)  "member  of  ' db  i dent i f i er ' " ,  where  ' db  iden- 
tifier' must  be  a  set  name.  It  evaluates  true  if  and  only 
if  the  current  record  o*  the  orocess  is  a  member  of  a  set  of 
the  tvoe  specified. 

(3)  "owner  of  ' db  identifier'",  where  ' db  iden- 
tifier' must  be  a  set  name.  It  evaluates  true  if  and  only 
if  the  current  record  of  the  Drocess  is  the  owner  of  a  set 
of  the  tyoe  specified. 

b.  DML  Record  Selection  Exoressions. 

These  exoressions  result  in  a  data  base  key 
which  can  be  used  to  find  a  record.  They  are  evaluated  in 
part  within  the  user  program  but  must  be   validated   by   the 

90 


schema   dbm   program.   They  must  appear  only  in  DML  function 
argument  lists.   The  forms  possible  are  as  follows. 

(1)  Exolicit  Key.  The  simplest  form  of  record 
selection  expression  is  by  explict  key.  The  format  is 
"'key'".  The  'key'  must  be  either  an  item  of  tyoe  "dbkey" 
or  evaluate  to  a  character  pointer.  The  contents  of  the 
'key'  are  used  as  a  data  base  key.  This  form  is  useful  for 
accessing  records  whose  keys  are  known.  Tt  can  also  be  used 
for  aoplyina  currency  which  has  previously  been  suppressed 
(e.g.  " f i ndfkey (process )); "  applies  all  appropriate  currency 
to  the  current  record  of  the  process). 

(2)  Owner  Record.  Selection  of  an  owner  record 
has  the  fo.rmat  "  '  db  identifier*  owner  of  'key'"*  where  the 
"of  'key'"  phrase  is  optional.  The  ' db  identifier'  is  a  set 
name  and  the  'key'  is  an  explicit  key.  If  the  "of  'key'" 
phrase  is  not  used*  the  owner  record  of  the  current  instance 
of  the  set  specified  is  selected?  otherwise  the  owner  in  the 
set  type  specified  for  the  record  identified  by  the  'key'  is 
sel ec t ed. 

(3)  Relative  Selection.  This  form  allows  the 
selection  of  a  particular  record  from  an  area  or  set  based 
on  a  location  criterion.  The  expression  has  the  format 
"'criterion'  * db  identifier'  of  ' db  identifier'".  The  first 
' db  identifier'  is  optional  and  is  the  name  of  a  record 
tyoe.  The  second  ' db  identifier'  is  the  name  of  an  area  or 
set  type.   The  'criterion'  determines   the   location   within 
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the  area  or  set  from  which  the  record  will  be  selected.  The 
allowed  criteria  are  "next",  "orior",  "first"/  "last"  and  an 
expression  which  evaluates  to  an  inteaer.  When  the  record 
type  is  included*  only  occurences  of  that  type  record  will 
be  considered  for  selection.  The  'criterion'  refers  to  the 
ordering  of  the  area  or  set.  The  ordering  of  an  area  is 
considered  to  be  ascendinq  seouence  by  data  base  key. 
"Next"  and  "orior"  are  relative  to  the  current  record  of  the 
area  or  set.  If  the  current  record  of  the  set  is  the  owner 
record/  "next"  and  "prior"  are  equivalent  to  "first"  and 
"last"  respectively. 

( U )  CALC  Key.  If  a  record  type  is  defined  in 
the  schema  as  having  a  location  mode  of  CALC/  the  format 
"duplicate  ' db  identifier'"/  where  "duplicate"  is  optional/ 
may  be  used.  The  'db  identifier'  is  a  record  type  defined 
in  the  schema  to  have  location  mode  CALC.  Prior  to  the 
evaluation  of  the  record  selection  expression/  the  items  in 
the  record  designated  as  Dart  of  the  CALC  key  must  have  been 
initialized  to  the  desired  values. 

If  the  "duplicate"  phrase  is  included/  the 
current  record  of  the  process  must  be  of  the  soec i f i ed  tvpe 
and  have  the  same  CALC  kev  as  in  the  record  buffer.  If 
these  conditions  are  satisfied/  a  synonym  to  the  current 
record  is  selected. 
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(5)  Data  Value.  Selection  by  data  value  is 
possible  using  the  format  "duplicate  ' db  identifier'  via 
•set  select'  '  db  identifier'  ==  '  db  identifier'  ...",  where 
the  phrase  "=  =  • db  identifier'  . . . " ,  and  the  word  "dupli- 
cate" is  optional.  The  first  '  db  identifier'  is  a  record 
name?  the  second*  a  set  name?  and  the  string  of  ' db 
■identifier's  is  made  up  of  items  in  the  named  record  tyoe. 
The  'set  select'  phrase  consists  of  either  the  word 
"current"  or  the  format  "'db  identifier'  ...  select".  The 
' db  identifier'  string  in  the  'set  select'  phrase  is  of  made 
uo  of  the  items  needed  for  the  selection  path  specified  in 
the  SELECTION  clause  for  the  the  named  record  type  and  named 
set  type . 

If  the  word  "duplicate"  is  omitted*  the 
expression  selects  the  first  record  occurence  in  the 
appropriate  set  with  values  matching  those  of  the  items  in 
the  string  of  ' db  identifier's.  If  the  string  is  not  speci- 
fied* the  first  record  of  the  named  type  in  the  set  is 
selected.  When  the  list  of  items  is  specified*  the  items  in 
the  list  must  have  been  initialized  to  the  desired  values 
prior  to  evaluation  of  the  record  selection  expression.  If 
"current"  is  included*  the  current  instance  of  the  named  set 
is  used*  otherwise  the  set  used  is  selected  on  the  basis  of 
the  selection  criteria  in  the  schema  for  the  named  record 
tvoe  as  a  member  of  the  named  set  tyoe. 

If  the  word  "duplicate"  is  included* 
"current"   ^  u  s  t   be  included  and  the  item  list  is  mandatory. 
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The  record  selected  will  be  the  next  record  in  the  set 
matching  the  current  record  of  the  process  in  the  fields 
named  in  the  item  name  strinq. 

2.   DML  Routines. 

The  ability  to  access*  retrieve  and  update  records 
is  provided  by  DML  routines.  The  routines  are  divided  into 
area  manipulations  (dbooen*  dbclose)*  record  manipulations 
(key  through  delete)  and  set  manipulations  (insert/  remove). 
The  oermit  function  does  not  fit  into  any  of  these 
categories.  Considerable  overlao  between  categories  exists 
among  the  other  functions.  All  functions  have  the  same  form 
as  normal  C  subroutine  calls.  In  the  following  description* 
all  error  codes  listed  have  a  two  decimal  digit  major  code 
and  a  two  decimal  digit  minor  code  specifying  the  function 
and  soecific  error  respectively. 

a.  Permit.  The  permit  function  must  be  called  only 
once  and  must  be  before  any  other  DML  functions.  It  causes 
validation  of  the  subschema  by  the  sc^e^a  D B M  program  and 
estaolishment  of  the  privacy  permissions  reauired.  If  the 
schema  lock  is  violated*  an  error  code  of  0010  is  returned 
in  er ror  .  st at  us  .  If  any  other  privacy  lock  is  failed*  no 
indication  is  given  until  the  user  program  attempts  to  use 
the  feature  not  oroperly  unlocked.  If  a  mismatch  occurs 
between  the  schema  definition  and  the  subschema  definition* 
error  code  0060  will  be  returned  in  er ror . s t at  us .  When  this 
occurs*  error. count  will  contain  the  number  of  i ncompa t i b i 1 - 
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ities;  eppor.tyoe  will  be  1*  ?.  or  3  depending  on  whether  the 
first  error  encountered  was  in  an  area,  record  or  set  entry; 
error. area*  error. record  and  error. set  will  indicate  the 
first  erroneous  entry  in  the  area,  record  and  set  entries 
respectively.  The  entry  number  returned  identifies  entries 
in  the  C  subschema  and  is  zero  if  no  errors  were  encoun- 
tered. If  any  other  OML  function  is  attempted  prior  to  per- 
mit* error  code  nn61  will  be  returned*  where  nn  indicates 
the  function  attemDted. 

b*  DboDen.  Prior  to  processina  any  records  in  an 
area*  the  user  Droaram  must  call  dbooen  to  open  the  area. 
Obooen  Darameters  are  an  openinq  mode  and  a  list  of  area 
names.  The  opening  mode  is  an  octal  code  formed  as  follows. 
If  the  low  order  bit  is  1*  the  mode  is  for  update  and 
retrieval  otherwise  it  is  retrieval  only.  The  next  most 
significant  two  bits  are  zero  for  concurrent  update  per- 
mited*  1  for  concurrent  retrieval  but  no  concurrent  update 
(protected  mode)?  and  2  or  3  for  no  concurrent  use  permitted 
(exclusive  mode).  Ml  the  areas  in  the  parameter  list  are 
opened  in  the  specified  mode.  If  no  area  list  is  soecified* 
all  the  areas  in  the  subschema  are  opened.  For  all  tem- 
porary areas  opened*  a  mode  of  exclusive  update  is  assumed 
no  matter  what  mode  is  specified.  Modes  allowing  concurrent 
processes  to  update  areas  are  included  for  comoat ab i 1 i t v 
purposes*  however*  unless  the  implementation  of  the  data 
base  manaaement  system  is  modified*  these  modes  can  cause 
severe  integrity  problems. 
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To  successfully  execute  a  find/  store*  delete  or 
close*  appropriate  areas  must  be  ooen  as  follows:  all  areas 
which  contain  any  record  occurence  which  would  be  deleted  or 
removed  by  a  delete  statement  and  all  areas  which  are  the 
objective  of  a  close  function.  If  any  of  these  functions 
fails  to  meet  these  conditions*  error  status  nnOl  is 
returned*  where  nn  indicates  the  function  attempted. 

In  addition  to  the  areas  containinq  the  object 
records  of  the  functions  cited  above*  there  are  additional 
(i.e.  implicit)  areas  which  could  be  impacted  by  D^L  func- 
tions. This  impact  can  be  of  two  forms:  the  OBM  Droqram 
requires  information  contained  within  the  implicit  area  (in 
which  case  the  area  must  be  "ava i 1  ad  1 e" )  or  the  OBM  program 
must  alter  the  information  contained  in  records  in  the 
i mo  1 i c i c t  area  (in  which  case  the  area  must  not  only  be 
available*  but  it  must  permit  the  necessary  alteration). 
Implicit  areas  reauirina  modification  are    termed  "affected". 

A  user  may  assume  the  followinq  areas  will  be 
affected:  all  areas  containina  any  record  which  participates 
in  a  set  occurence  into  which  a  record  is  to  be  inserted  or 
from  which  a  record  is  to  be  removed  or  deleted  and  all 
areas  containina  any  records  which  oarticipate  in  any  set 
occurence  whose  membership  or  seauence  is  altered  by  a  store 
or  modify  function.  If  an  implicit  area  which  is  affected 
is  not  open*  error  code  n  n  2  1  will  be  returned  to 
er ror  .  s t at  us  *  where  nn  indicates  the  function  attemDted. 
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To  successfully  execute  insert*  remove*  store* 
delete  or  modify  functions*  both  explicitly  and  implicitly 
affected  areas  involved  must  be  opened  for  update.  If  any 
of  the  involved  areas  are  open  for  retrieval  only*  an  error 
code  of  nn09  will  be  returned  to  error. status  where  nn  indi- 
cates the  function  attempted. 

Record  occurences  which  are  in  the  search  path 
of  a  find  or  an  implicit  find  which  is  the  result  of  a 
store*  remove  or  delete  function  need  only  be  in  areas  which 
are  available.  In  order  for  an  area  to  be  available*  it 
must  not  be  opened  for  an  exclusive  mode  by  a  concurrent 
process.  Although  it  need  not  be  ODen*  the  full  overhead  of 
a  dbooen  and  dbclose  will  be  incurred  for  each  implicit 
reference  to  an  area  which  is  not  ooen.  If  an  implicit  area 
is  not  available*  error  code  nnlfl  will  be  returned  to 
•er ror  .  s t at  us  *  where  nn  indicates  the  function  attempted. 

Any  attempt  to  execute  a  dbopen  function  which 
would  result  in  a  usaae  mode  conflict  for  any  area  will 
result  in  the  failure  to  open  every  area.  Additionally* 
error  code  0929  will  be  returned  in  e r ror  .  s t a t us  .  A  usage 
mode  conflict  will  occur  under  the  following  conditions:  any 
mode  of  uodate  on  an  area  ooened  in  an  exclusive  or  pro- 
tected mode  bv  another  process*  any  protected  mode  on  an 
area  opened  for  update  by  another  process?  exclusive  mode  on 
an  area  ooened  for  any  mode  by  another  orocess*  and  any  mode 
on  an  area  ooened  for  exclusive  use  by  another  orocess.  In 
order  to  prevent  deadlock  conditions*  a  process  should   ooen 
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all  areas  needed  for  exclusive  or  orotected  use  in  one  dbo- 
pen.  If  a  dbopen  fails  because  of  usage  conflict*  the  pro- 
cess should  close  any  other  ooen  areas  obtained  previously. 

If  a  Drivacy  lock  is  violated*  error  code  0910 
is  returned  in  error .  st at  us .  If  an  area  ooened  was  alreadv 
open*  warning  error  code  0928  is  returned  in  er ror . s t at  us . 
The  total  number  of  errors  encountered  is  returned  in 
error .count  . 

c  .   Dbc 1 ose. 

when  an  area  is  no  longer  needed  it  may  be 
released  for  use  by  other  processes  with  the  dbclose  func- 
tion. Dbclose  Darameters  are  a  list  of  area  names.  All  the 
areas  in  the  list  are  closed.  If  the  oarameter  list  is 
omitted*  all  ooen  areas  are  closed.  After  the  dbclose  is 
executed*  all  current  records  in  closed  areas  cease  to  be 
current.  If  any  area  named  in  the  parameter  list  is  not 
ooen*  error  code  0101  is  returned  to  error. status  and 
error. count  will  contain  the  the  number  of  errors  detected. 

When  the  process  terminates  (even  abnormally)* 
no  dbclose  is  needed  as  all  areas  will  be  closed.  If  the 
dbclose  function  is  executed  on  a  temoorary  arear  the  data 
within  the  area  is  not  lost  and  the  area  can  be  reopened  and 
processed.  When  the  process  terminates*  however*  all  tem- 
oorary areas*  ooen  or  closed*  are    lost. 
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\       d.   Find. 

The  find  function  al lows  the  user  program  to 
select  a  record  from  the  data  base  and  make  i\t  the  current 
record  of  the  run  unit  and*  selectively*  of  the  appropriate 
record  and  set  types.  The  parameters  are  a  record  selection 
expression  and  a  suppress  code.  The  record  selection 
expression  is  discussed  in  l.b  above.  The  suppress  code  is 
an  octal  code  whose  least  significant  bit  indicates  set 
suoression  and  whose  next  least  siqnificant  bit  indicates 
record  suppression.  If  set  suppression  is  indicated*  addi- 
tional find  parameters  are  oermited*  each  of  which  is  the 
name  of  set  type. 

Execution  of  a  successful  find  function  causes 
the  selected  record  to  become  the  current  record  of  the  pro- 
cess* the  area  in  which  it  is  located*  the  record  type  of 
the  record  and  all  set  tvoes  in  which  it  participates  as  an 
owner  or  member  record.  If  record  or  set  suopression  is 
indicated*  the  object  record  does  not  become  current  for 
these  types.  When  the  list  of  set  names  is  included* 
currency  update  is  suppressed  onlv  in  the  named  sets.  After 
a  find*  the  data  fields  of  the  record  are  not  available*  but 
its  data  base  key  can  be  derived  (through  the  key  function)* 
a  pointer  to  its  area  buffer  is  in  "areaname"  and  a  pointer 
to  the  aporopriate  record  tyre  structure  is  in  "recname". 
The  record  can  now  be  retrieved  bv  the  aet  function. 
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The  fol lowing  error  codes  may  be   returned   into 
error. status  by  a  find. 

0301  The  souaht  record  is  in  an    area  which  is  not  open. 
0318   A  record  occurence  alona  the  search   oath   of   the 

find   is  in  an  area  under  the  exclusive  control  of 
another  process. 

0302  A  data  base  key  was  suoplied  or  developed  which  is 
incomDatable  with  the  areas  specified  for  a  record 
of  this  t  ype . 

0307   An  end   of   area   or   end   of   set   condition   was 

detect  ed. 
0326   No  record  in  the  area    for  selection   through   CALC 

key  satisfies  the  record  selection  expression. 

0322  Owner  record  selection  is  specified  and  the  data 
base  key  aiven  is  for  a  record  which  does  not  par- 
ticipate in  a  set  of  the  desired  tyne. 

0323  Relative  selection  was  specified  and  the  specified 
record  cannot  be  in  the  desired  area. 

0310   A  orivacy  breach  was  attempted, 

0361   No  call  to  the  oermit  function  has  been  made. 

e.   Get  . 

The  get  function  is  used  to  transfer  the  data 
values  of  the  current  record  of  the  orocess  into  the  pro- 
cess' buffers.  Its  parameters*  which  are  ootional »  are  item 
names  from  some  record  type.  Tf  the  record  type  and  item 
names  are  specified/  only  the  items  named  are  extracted.  If 
the   item   names  are    not  soecified*  all  the  items  defined  in 
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the  subschema  for  that  record  are  extracted.  A  get  must  be 
executed  for  a  record  before  any  of  its  item  values  can  be 
exam  i  ned . 

The  following  error  conditions  may  be  returned 
to  error. status  for  a  get. 

0513   The  current  record  for  the  orocess  is  unknown. 

0510   A  privacy  breach  was  attempted. 

0520   A  record  name  is  SDecified  and  the  current   record 

of  the  orocess  is  not  of  that  type. 
0561   No  call  to  the  permit  function  has  been  made. 
0554   Truncation  of  significance  occured  during   conver- 
sion  from   the   schema  tyoe  to  the  subschema  tvDe 
for  an  i  t  em. 
In  all  but  the  last  case*  no  data  is  transferee!  to  the   user 
process . 

f .   Store. 

The  store  function  is  used  to  create  a  new 
record  occurence  in  the  data  base.  It  acguires  space  and  a 
data  base  key  for  a  new  record  occurence  in  the  data  base* 
causes  the  data  ite^s  in  the  record's  buffer  to  be  used  in 
initializing  the  record*  inserts  the  record  into  all  sets  in 
which  it  is  an  automatic  member  and  establishes  a  new  set 
ocurrence  of  each  set  type  for  which  the  record  is  defined 
as  an  owner  in  the  schema. 

The  oarameters  of  the  function  are  a  record 
name*  a  suooress  code  and  one  or  more  set  names.   The  record 
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name  specifies  the  record  type  to  be  created.  The  suppress 
code  and  set  names  are  exactly  analogous  to  those  of  the 
find  function.  In  order  for  the  store  to  function  properly, 
the  subschema  must  include  the  following:  the  named  record? 
the  Hdata-base-i dent i f iers"  or  set  specified  in  the  "LOCA- 
TION" mode  clause  of  the  record;  at  least  one  of  the  areas 
specified  in  the  within  clause  for  the  record;  all  sets  in 
which  the  record  is  defined  as  an  automatic  member;  and  all 
"dat a-base-i dent i f i ers" ,  records  and  sets  specified  or 
referenced  in  the  "SELECTION"  and  "KEY"  clauses  of  the  set 
member  subentries  in  which  the  record  is  Oefined  as 
automatic  (see  Ref.  2,     section  3.4.0). 

Prior  to  calling  store*  it  is  the  user  program's 
responsibility  to  insure  the  followinq  is  done.  All  data 
items  in  the  record  tvpe  buffer  must  be  initialized.  If 
multiple  areas  are  defined  in  the  "WITHIN"  clause  for  the 
record  with  the  "dat a-base-dat a-name-1 "  option  (see  Ref. 2, 
Section  3.3.0)  and  the  "LOCATION"  mode  is  not  direct, 
"areaia"  must  contain  the  desired  area  pointer.  If  the 
"LOCATION"  mode  is  direct  with  the  "dat a-base-dat a-name- 1 " 
option,  keyname  must  have  the  appropriate  data  base  key 
Stored  in  it.  If  any  automatic  membershio  has  a  "SELECTION" 
method  of  "THRU  CURRENT",  the  current  record  of  the  set  type 
must  specify  the  correct  set.  All  data  items  mentioned  in 
the  selection  clauses  of  the  member  entries  which  are 
automatic  for  the  record  and  all  data  items  mentioned  in  the 
"LOCATION"  clause  of  the  record  entry  must  be  initialized. 
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If  an  error  occurs  during  the  execution  of  a 
store*  the  new  record  is  not  created*  no  currency  indicators 
are  changed  and  an  error  code  is  returned  to  er ror  .  s t at  us . 
The  following  errors  and  codes  can  be  encountered. 

1201  The  object  record  is  to  be  stored  in  an  area  which 
i  s  not  open . 

1221  A  record  occurence  which  is  affected  by  the  store 
function  is  in  an  area  which  is  not  open. 

1209  The  object  record  of  the  store  or  some  record 
occurence  affected  by  the  store  is  in  an  area 
which  is  open  for  retrieval  only. 

1218  Some  record  occurence  needed  by  the  store  for 
information  (e.g.  search  oaths)  is  in  an  area 
which  is  not  available. 

1212   No  data  base  keys  are  available. 

1211   No  media  space  is  available. 

1202  A  data  base  key  passed  by  the  user  or  qenerated 
via  a  "  C  A  L  C  "  procedure  is  not  valid. 

1205   The   record   would   violate   a    "DUPLICATES    NOT 

ALLOWED"   clause  defined  for  one  of  the  records  or 

sets  involved. 
1225   For  one  of  the  set  types  involved  a  set   occurence 

cannot   be   matched   to  the  relevant  set  selection 

c  r  i  ter i  a . 

1210  A  privacy  breach  was  attempted. 

1227   A  check  clause  applies  and  one  of  the   data   items 

did  not  pass  . 
1223   The  area    soecified  for  the  record  is   not   one   of 
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those  in  the  record's  "WITHIN"  clause. 
1224   The  execution  of  the  store  statement  would  cause  a 

set   occurence   to   have  records  in  both  temporary 

and  permanent  areas. 
1219   The  value  of  an  item  cannot  be   converted   to   the 

tvoe  specified  in  the  schema  for  that  item. 

g.   Modi  f y  . 

The  modify  function  enables  the  updating  of  some 
or  all  of  the  data  items  defined  in  the  sub-schema  for  a 
record  and  the  chanaing  of  set  occurences  in  which  a  record 
participates.  The  parameters*  which  are  all  optional*  are  a 
record  name*  a  list  of  items  in  the  record  and  set  parame- 
ters identical  to  those  of  the  insert  function  (see  Section 
B.2.j).  If  the  items  are  not  soecified*  then  every  item  in 
the  record  which  is  known  to  the  sub-schema  is  updated*  oth- 
erwise only  the  named  items  are  updated.  If  the  set  names 
are  specified*  the  action  taken  is  eauivalent  to  a  remove 
function  followed  by  an  insert  function  for  the  named  sets 
with  the  followina  exceptions.  The  record  must  be  in  an 
occurence  of  every  set  named  prior  to  the  modify  function. 
The  set  membership  in  the  named  sets  can  be  defined  as  man- 
datory or  automatic*  or  both. 

The  object  of  the  modify  is  the  current  record 
of  the  process.  All  data  items  to  be  updated  and  all  items 
required  for  an  insert  on  the  named  sets*  must  be  initial- 
ized  for  the  modify.   If  any  of  the  modified  data  items  are 
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sort  control  items  for  a  set  occurence  in  which  membership 
is  retained,  the  position  within  the  set  is  modified  accord- 
ingly. If  any  of  the  items  changed  are  in  a  "SEARCH"  key 
clause,  the  index  is  updated.  The  record  becomes  the  current 
record  of  its  record  type  and  all  sets  it  has  membership  in. 

If  an  error  occurs  during  a  modify/  no  data  base 
or  currency  changes  are  made  and  an  error  code  is  returned 
to  error  .status.  The  possible  error  conditions  and  the 
associated  codes  are  as  follows. 

0803   One  of  the  items  changed  is  in  a  CALC  key  and   the 

data   base  key  would  be  altered,  or  an  area  number 

specified  for   owner   record   selection   disagrees 

with  the  CALC  key  developed  for  the  owner. 
0825   A  set  occurence  satisfying  the  specified   criteria 

was  not  found. 
0822   The  record  is  not   currently   a   member   of   every 

soec  i  f  i  ed  set  . 
0805   The  insertion  of  the  record  into  a   set   occurence 

would  violate  a  "DUPLICATES  NOT  ALLOWED"  clause. 
0810   A  privacy  breach  was  attempted. 
0827   A  check  clause  was  failed. 
0821   Some  record  occurence  affected  bv  the  modify  is  in 

an  area  which  is  not  open. 
0821   The   object   record   or   some    record    occurence 

affected   by  the  modify  is  in  an  area    which  is  not 

ooen  for  update. 
0818   Some  record  occurence  which  is   implicitly   refer- 
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enced   is   in   an   area   open  for  exclusive  use  bv 
another  process. 
0819   A  modified  data  item  cannot  be  converted  into   the 

format  used  bv  the  schema  for  the  item. 
0824   Insertion  of   the   object   record   into   some   set 
occurence   would   cause  that  set  occurence  to  have 
members  in  both  temporary  and  permanent  areas. 
0861   No  call  to  the  permit  function  has  been  made. 
In  all  cases/  no  change  is  made  to  the  data  base  or   to   the 
currency  indicators  of  the  process. 

h.   Key. 

The  key  function  allows  the  extraction  of  the 
data  base  key  for  one  of  the  current  records.  The  function 
needs  one  parameter  which  may  be  a  record/  set  or  area  name 
or  the  word  "Drocess".  The  function  returns  a  pointer  to  a 
character  array  containina  the  data  base  key  for  the  current 
record  of  the  inout  parameter.  The  key  should  be  treated  as 
read  only. 

If  an  error  occurs  during  the  kev  function/  a 
null  pointer  is  returned  and  the»  error  code  is  returned  to 
error . st at  us  .  The  error  conditions  possible  are  no  current 
record  exists  for  the  input  parameter  passed  (code  130b)  and 
no  call  to  the  permit  function  (code  1361). 
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i  .   Del ete. 

The  delete  function  is  used  to  destroy  the 
current  record  of  the  process,  releasinq  its  data  base  key 
and  storage*  and  to  selectively  delete  all  of  the  records 
which  are  members  of  set  occurences  owned  by  the  current 
record  of  the  run  unit.  The  function  reauires  a  sinale 
integer  parameter  in  the  range  zero  to  three  with  meaning  as 
follows.  A  zero  parameter  causes  deletion  of  the  record  if 
and  only  if  it  is  not  the  owner  of  any  non-empty  set 
occurences.  If  the  parameter  value  is  one*  the  record  is 
deleted?  all  optional  members  are  removed  from  its  set 
occurences  and  all  mandatory  members  of  its  set  occureneees 
are  deleted.  If  the  carameter  is  two?  the  action  is  identi- 
cal to  that  of  one  exceot  that?  if  any  of  the  records  whose 
membership  is  optional  do  not  participate  in  set  occurences 
owned  by  a  different  record?  then  they  are  deleted  also.  If 
the  Darameter  value  is  three?  then  the  record  and  every 
member  of  its  set  occurences  are  deleted.  For  any  member 
record  deleted?  the  deletion  of  the  member  records  in  that 
record's  sets  is  decided  as  if  that  record  were  the  object 
of  a  delete  function  with  an  identical  parameter  as  that  for 
the  originally  deleted  record. 

If  an  error  occurs  durino.  the  function?  no 
records  are  removed  or  deleted  and  an  error  code  is  returned 
to  error .  st at  us .  The  oossible  errors  are    as  follows. 

0230   A  delete  with  oarameter  zero  was  attempted  and  the 
record  owns  a  non-ewDtv  set  occurence. 
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0213   The  current  record  of  the  process  is  unknown. 

0210   A  privacy  breach  was  attempted. 

0221   One  of  the  affected  member  records  is  in   an   area 

which  is  not  open. 
0209   The  current  record  of  the  process  or  some  affected 

record  is  in  an  area  open  for  retrieval  only. 
0218   An  implicitly  referenced   record   is   in   an   area 

which   is   open   for   the  exclusive  use  of  another 

process. 
0208   The  sub-schema  does  not  know  about  all  the   record 

tyDes  which  would  be  deleted  or  removed*  or  all  of 

the  set  types  of  set  occurences  which   would   have 

records  removed. 

j.   Insert. 

This  function  causes  the  current  record  of  the 
process  to  become  a  member  of  an  occurence  of  the  specified 
set  tyDes*  Drovidina  it  is  defined  as  an  optional  automatic* 
optional  manual  or  mandatory  manual  member  of  those  sets. 
The  parameters*  which  are  optional*  are  a  record  tvoe  and 
one  or  more  set  names.  Additional  parameters  may  follow  each 
set  name  deoendina  on  the  selection  criteria  for  the  member 
entry  of  the  object  record's  tvoe.  If  the  root  set  in  the 
selection  path  has  N0 AT A-BASE-KEY "  specified  with  the 
"dat a-base-dat a-name-t "  ootion  (see  Ref.  2*  Section  3.4.0), 
a  pointer  to  an  array  containing  a  data  base  key  or  an  item 
name  of  tyoe  dbkey  must  be  included.  If  the  root  set  has 
the   "CALC-KEY"   option   with    the    "dat a-base-dat a-names " 
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specified,  item  names  or  oointers  to  data  whose  type  matches 
that  of  the  corresponding  items  in  the  CALC  key  must  be 
included.  For  each  set  after  the  root  in  the  selection  path 
which  uses  the  "EQUAL  TO  dat a-base-dat a-name-a"  option  (see 
Ref.  2f  Section  3.4.0),  an  item  name  or  pointer  to  data 
which  matches  the  type  of  the  data  item  specified  in  the 
selection  clause  must  be  included.  In  addition  to  tne 
explicit  parameters  above,  all  data  items  needed  in  the 
selection  oath  as  specified  in  the  selection  clause  must  be 
specified.  If  the  owner  record's  "WITHIN"  clause  specifies 
multiple  areas,  "areaid"  or  the  aopropriate  data  item  must 
be  initialized  to  the  appropriate  area.  See   Reference   2, 

section  3.U.11  for  a  description  of  the  selection  clause. 
If  a  set  name  is  specified  with  no  additional  oarameters, 
then  the  set  used  is  the  current  set  of  that  type. 

If  the  set  names  are  specified,  the  record  must 
not  be  in  an  occurence  of  any  of  the  named  set  type.  Tf  no 
set  names  are  specified,  the  record  is  inserted  into  the 
current  occurence  of  each  set  type  for  which  the  record  is 
defined  as  optional  automatic,  optional  manual  or  mandatory 
manual  provided  the  record  does  not  already  participate  in  a 
set  of  that  type.  After  the  insert,  the  record  becomes  the 
current  record  of  every  set  to  which  it  has  been  added. 

If  an  error  occurs  durinq  an  insert,  tne  data 
base  remains  unchanaed,  no  currency  indicators  change  and 
the  appropriate  error  code  is  returned  into  error. status. 
The  possible  error  conditions  are    as  follows. 
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0713  The  current  record  of  the  process  is  unknown. 

0714  Set  names  are  specified  and  the  record  is  not 
defined  as  an  optional  automatic*  optional  manual 
or  mandatory  manual  member  of  each  of  them. 

0705  The  record,  when  inserted,  would  violate  a  "DUPLI- 
CATES NOT  ALLOWED"  clause  for  some  record  or  set 
i  nvol ved. 

0710  The  current  record  of  some  set  name  SDecified  in  a 
"CURRENT"  clause  of  a  selection  entry  is  unknown. 

0716  The  record  is  already  in  an  occurence  of  a  set 
explicitly  specified  or  of  every  set  implicitly 
soec  i  f  i  ed. 

0720  The  record  tyoe  was  passed  as  a  parameter  and 
disagrees  with  the  tvDe  of  the  current  record  of 
the  process. 

0721  A  record  occurence  which  is  affected  is  in  an  area 
which  is  not  open . 

0709  The  record  inserted  or  some  affected  record  is  in 
an  area    which  is  ooen  for  retrieval  only. 

0718  A  record  occurence  implicitly  referenced  by  the 
insert  is  in  an  area  which  is  not  available. 

072H  Insertion  of  the  record  into  a  set  would  cause  the 
set  to  have  members  in  both  temoorary  and  per- 
manent areas . 

0761   No  call  to  the  Derm  it  function  has  been  made. 
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k .   Remove. 

This  function  is  used  to  cancel  the  membership 
of  the  current  record  of  the  process  in  specified  set 
occurences  for  which  the  record's  membership  is  optional. 
The  parameters*  which  are  optional/  are  a  record  name  and 
one  or  more  set  names.  If  the  set  names  are  specified*  the 
object  record  must  participate  in  an  occurence  of  at  least 
one  of  them  and  its  membership  in  each  of  them  is  canceled. 
If  no  set  names  are  specified*  every  optional  membership  in 
a  set  occurence  for  the  record  is  cancelled. 

If  an  error  occurs  during  the  remove*  no  set 
memberships  are  canceled*  no  currency  information  is 
affected  and  the  error  condition  is  returned  into 
er ror . s t at  us .  The  followina  errors  are    possible. 

1113   The  current  record  of  the  process  is  not  known. 

1120  A  record  tyoe  parameter  was*  passed  and  it 
disaqrees  with  that  of  the  current  record  of  the 
process  . 

1115   The  record  is  not  defined  as  an  optional  member  of 

any  named  set  tyoe. 
1122   The  record  does  not  participate  in  at  least  one  of 

the   sets   named;   or   if  no  sets  are    named*  in  at 

least  one  of  the  oossible  sets   for   which   it   is 

o  o  t  i  o  n  a  1  . 
1110   A  privacy  breach  was  attempted. 

1121  Some  record  affected  by  the  remove  is  in  an  area 
which  is  not  open . 
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1109  The  current  record  of  the  process  or  some  affected 
record  is  in  an  area    which  is  not  open  for  update. 

1118  Some  implicitly  referenced  record  is  in  an  area 
ODened  for  exclusive  use  by  another  process. 

1161   No  call  to  the  permit  function  has  been  made. 
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APPENDIX  B.   FILES  ASSOCIATED  WITH  A  SCHEMA. 


A.   Files  in  the  Schema  Directory. 

Most  of  the  files  associated  with  a  schema  are  contained 
in  a  directory  bearing  the  name  of  the  schema.  This  direc- 
tory becomes  the  current  directory  for  the  schema  DBM  pro- 
gram. In  the  descriotion  of  the  files  within  the  schema 
directory,  the  term  " schemaname"  indicates  a  variable  por- 
tion of  a  file  name  which  is  redaced  by  the  name  of  the 
particular  schema  when  the  files  are    named. 

1.  Source  DescriDtion  Fi'le. 

The  Source  Descriotion  File  contains  the  schema 
descriotion  in  the  source  CODASYL  DDL  form.  Its  name  is 
"s. schemaname"  . 

2.  Encoded  Description  File. 

The  Encoded  Description  File  contains  the  compiled 
descriotion  of  the  schema.  It  contains  data  base  names  and 
encoded  descriptions  for  the  areas*  records  and  sets  in  the 
schema.  It  is  used  Drincioally  by  the  schema  DBM  program  in 
the  initialization  orocess.   Its  name  is  "des . sc hemaname"  . 
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3.  Schema  DBM  Program. 

The  schema  DB^  proqram  is  the  data  base  manager  for 
the  schema.  It  is  comprised  of  the  DBM  skeleton  routine 
compiled  together  with  any  data  base  procedures  used  in  the 
schema.   Its  name  is  "dbm . schemaname" . 

4 .  Schema  Library. 

This  file  is  optional  and»  when  present/  contains 
data  base  procedures  uniaue  to  the  schema.  It  is  named 
"1 ib. schemaname". 

5  .   A  rea    Data  Files. 

These  files  contain  the  data  for  all  the  defined 
areas  in  the  schema  which  are  not  designated  as  temporary. 
Their  names  are    the  same  as  the  areas  which  they  represent. 

6.   Area  Data  Base  Key  Files. 

These  files  contain  the  byte  offsets  associated  with 
each  data  base  key  for  the  areas  which  are  not  desiqnated  as 
temoorary.  The  files  are  named  by  orefixina  the  area  name 
by  "k.". 

7  .   Index  B 1 ock  File. 

This  file  provides  storaae  for  all  the  indices  used 
for  set  linkaae  in  the  data  base.  It  is  called 
"index. schemaname". 
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8.  Open  Lockout  File. 

This  file  is  used  by  intearity  routines  to  lock  out 
other  orocesses  when  setting  up  exclusive  or  Drotected 
access  priveleges  for  a  user  Drocess.  It  is  created  to  ini- 
tiate any  ooen  or  close  operation  and  removed  when  the 
operation  is  completed.  The  file  is  named  "opendum". 

9.  Index  Lockout  File. 

This  file  is  used  to  lock  out  other  processes  while 
attempting  to  acouire  an  index  from  the  index  block  file. 
It  is  handled  in  a  manner  analogous  to  the  open  lockout 
file.   The  file  is  name  "indexdum". 

10.  "essage  Buffer  File. 

This  file  is  used  by  the  schema  DBM  program  to 
assemble  messaaes  to  the  user  oroaram  which  are  lonaer  the 
512  characters.  Prior  to  storing  a  new  message^  the  file  is 
truncated  to  zero  length. 

B.   Files  in  the  Temporary  Directory. 

Certain  files  for  a  schema  are  stored  in  the  UNIX  tem- 
porary directory  C'Vtmp").  This  directory  has  the  charac- 
teristic that  should  a  system  crash  occurs r  all  the  files 
contained  within  it  are  lost.  This  directory  is  used  to 
store  files  *  h  i  c  h  are  associated  with  the  running  of  a  pro- 
cess and  therefore  should  be  lost  if  the  Drocess  is  ter- 
minated by  a    system  crash.   The  files  are    as  follows. 
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1 .  Area  Files. 

These  files  are  the  area  data  files  and  area  data 
base  key  files  for  all  areas  designated  as  temporary.  The 
naming  conventions  for  these  files  are  identical  to  those 
for  non-t emoorary  data  and  data  base  key  files  with  the  exe- 
ceDtion  that  the  orocess  id  (pid)  of  their  user  process  is 
suffixed  to  the  name. 

2.  Logical  Usage  Block  File. 

This  file  contains  the  loaical   usaae  block.    This 

block  is  used  during  open  and  close  ooerations  to  record  the 

usage  modes  for  the  various  areas  currently  in  use.  Its  name 
is  the  same  as  the  name  of  the  schema. 
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APPENDIX  C.   DBM  -  THE  DBM  REQUEST  PROCESSOR. 


A .   Int  roduct  i  on . 

Dbm  is  a  simple  command  1 anguaqe  orocessor  for  schema 
level  requests.  It  enables  the  data  administrator  to  per- 
form such  functions  as  compiling  a  schema*  moving  data  from 
schema  to  schema  and  aarbaqe  collection.  It  provides  the 
user  with  a  method  of  executing  a  orogram  to  utilize  the 
schema.  The  functions  of  "ALTER",  "DISPLAY"  and  "LOCKS" 
described  in  Ref.  2  are  provided  by  different  means. 
Namely,  the  UNIX  "ed"  and  "list"  functions  are  used,  with 
privacy  provided  by  the  file  access  privacy  of  UNIX.  The 
function  "COPY"  (for  subschema  use)  is  inapplicable  since 
the  C  language  DDL  is  not  a  proper  subset  of  the  schema  DDL 
as  was  the  case  with  COBOL.  In  addition,  the  cross  checking 
of  the  sub-schema  and  the  schema  is  done  at  execution  time. 

Prior  to  usina  dbm,  the  schema  directory  must  exist 
(UNIX  function  "rnkdir")  and  the  schema  source  file  should 
have  already  been  created  using  "ed",  the  UNIX  text  editor. 
Dbm  is  called  by  entering  "dbm  oathname",  where  the  pathname 
is  a  oath  endina  with  the  schema  name  of  the  schema  to  be 
used.  The  program  will  resDond  in  one  of  two  ways:  it  will 
display  "cannot  access  schema"  or  ">".  The  first  resoonse 
indicates  that  either  the  schema  soecified  does  not  exist? 
access   privacy   orev*»nts   access?   the   schema   is   not 
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directory;  or  a  file  called  "s.schemaname"  does  not  exist  in 
the  directory*  where  "schemaname"  is  the  name  of  the  schema. 
This  response  is  followed  by  immediate  termination  of  the 
program.  The  second  response  is  the  dbm  prompt  character 
and  means  that  dbm  is  ready  to  accept  commands. 

B.   Commands. 

Upon  recieving  the  oromot  character*  the   user   has   the 
option  of  soecifyina  any  of  six  commands  as  follows. 

i 
1  .   Comoi 1 e . 

The  compile  command  causes  the  schema  to  be  com- 
piled. The  command  format  is  Hc"  followed  by  a  carriage 
return.  The  compilation  process  causes  the  scanning  of  the 
schema  source  file*  "s.schemaname"*  and  creation  of  the 
encoded  schema  description  file*  "des . schemaname" *  and  the 
schema  data  base  manaaer*  "dbm. schemaname".  If  the  necessary 
permissions  are  not  Dresent  to  create  these  files*  dbm 
displays  "cannot  compile".  If  errors  exist  in  the  source 
file*  they  are  diSDlaved.  In  order  to  divert  the  error  list* 
an  optional  path  name  parameter  is  allowed  with  the  "c"  com- 
mand. If  the  specified  file  can  be  opened*  the  error  listina 
is  output  to  it*  otherwise  an  error  message  is  displayed  at 
the  user's  terminal.  When  the  "c"  command  is  finished*  the 
user  receives  a  prompt.   The  compiler  is  currently  a  stub. 
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2.  Move. 

This  command  allows  data  to  be  moved  from  an  old 
version  of  a  data  base  to  the  current  one.  The  command  for- 
mat is  "m"  followed  bv  a  Dath  to  a  schema  directory.  Move 
will  check  the  specified  schema  name  to  determine  if  it  is  a 
directory  containinq  an  encoded  schema  description  and 
schema  data  base  manaqer.  If  the  schema  is  nonexistent  or 
i naccessab 1 e ,  dbm  will  display  "schemaname  cannot  be 
accessed1*,  otherwise  the  data  from  the  desiqnated  schema 
will  be  moved  to  the  current  schema.  The  data  moved  is 
selected  by  findinq  all  area,  record  and  set  entries  with 
common  names  and  transferinq  the  data  which  is  associated 
with  these  common  areas,  records  and  sets.  Area,  record  and 
set  entries  should  have  the  same  order  in  both  schemas.  All 
data  presently  in  the  current  schema  will  be  lost.  If  the 
move  is  unsuccessful,  move  produces  error  messaoes.  After 
the  move  is  completed,  the  user  receives  a  prompt.  The  move 
function  is  currently  a  stub. 

3.  Execute. 

This  command  causes  the  execution  of  a  user  oroqram 
to  access  the  data  base.  Its  format  is  "x"  followeo  by  a 
path  to  a  user  oroqram  and  the  arauments  for  that  user  pro- 
gram. If  the  user  program  is  i naccessab 1 e ,  nonexistent  or 
not  a  oroqram,  dbm  orints  an  error  messaae  and  prompts. 
Otherwise  dbm  executes  the  user  proqram  ana,  upon  its  termi- 
nation, prompts. 
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4.  Garbaae  Collection. 

This  command  allows  waste  compression  in  area  data 
files  for  the  data  base.  The  format  is  "g"  followed  bv  a 
carriage  return.  This  causes  the  followina  events  for  each 
area  in  the  schema.  The  messaae  "number  of  bytes  wasted  in 
areaname  is  NNN.  Collect?  ( y  or  n)"  to  be  displayed*  where 
"areaname"  is  the  area  beina  Drocessed  and  "NNN"  is  the  area 
waste  count.  Enterinc  "y"  causes  the  area  file  to  be 
recreated  with  all  records  written  in  ascending  order  of 
data  base  key  and  with  all  wasted  space  eliminated.  Enter- 
ing "n"  causes  the  next  area  to  be  processed,  when  all  areas 
have  been  processed,  the  user  is  orompted. 

Due  to  the  lack  of  a  aarbage  collection  facility  in 

the   schema  dbm  skeleton,  freauent  aarbage  collection  may  be 

necessary.   Note  aarbage  collection  causes  any  assianment  of 

aata   base   keys  desianed  to  juxtapose  related  records  to  be 
reflected  in  the  area  data  file  as  well. 

5 .  F  ree  . 

If  UNIX  crashes  durina  a  dbm  spawned  function,  cer- 
tain files  may  be  left  in  a  state  makina  restart  impossible. 
The  command  "f  followed  by  carriaae  return  causes  this  con- 
dition to  be  eliminated.  The  free  command  removes  the  files 
"opendum"  and  "indexdum"  from  the  schema  directory,  if  they 
exist,  and  scans  the  index  block  file,  " i ndex . schemaname" , 
freeing  any  locked  indices. 
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C  •   Interprocess  Communication. 

Whenver  dbm  must  create  a  process,  it  uses  the  UNIX 
functions  "fork"  and  "exec".  The  former  causes  a  comolete 
copy  of  the  current  process,  called  the  child*  to  be  created 
and  the  later  causes  the  current  process  to  be  overlaid  and 
reolaced  by  the  proaram  specified.  Dbm  creates  children  as 
needed  to  do  its  work.  Whenever  a  need  may  arise  for  the 
children  to  communicate  with  each  other,  dbm  creates  inter- 
process communications  pipes.  These  pipes  appear  to  be  a 
pair  of  files,  called  the  ends,  each  with  an  open  file 
descriptor.  One  end  of  the  pipe  is  ooen  for  readinq  and  the 
other  for  writing.  Since  all  children  of  a  parent  executinq 
a  pipe  call  have  the  pipe  open  also,  the  pipe  can  be  used 
for  passing  data  back  and  forth.  Certain  protocols  must  be 
observed,  however.  The  pipe  can  only  effectively  be  used 
for  one  way  transmission  since  there  is  no  protocol  for 
preventing  a  process  from  readina  its  own  t ranm i ss i ons  back 
before  the  intended  receiver  has  a  chance  to  read  them.  The 
receiver  should  close  the  writinq  end  of  the  pipe,  otherwise 
the  receiver  will  wait  forever  if  trying  to  read  the  pipe 
after  the  sending  process  has  terminated.  This  phenomenon  is 
caused  by  the  fact  that  the  process  reading  a  pipe  will  go 
into  wait  state  if  any  process,  includinq  the  orocess  doinq 
the  read,  has  its  writinq  end  of  the  pipe  ooen.  If  no  pro- 
cess has  its  writinq  end  open  (termination  automatical  1 v 
closes  all  of  a  process'  ooen  files  and  pipes),  a  read  on 
the  pipe  will  return  an  end  of  file  condition. 
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APPENDIX  D.  SCHEMA  DESCRIPTION  FILE  FORMAT. 


The  schema  description  file  contains  the  encoded  schema 
description.  The  file  is  used  by  the  dbm  move  command  and 
in  initializing  the  schema  DBM  proqram.  Its  format  is 
described  below. 

A.  Schema  Entry. 

The  schema  entry  is  headed  by  a  null  terminated  string 
containing  the  schema  name.  Next  is  a  orivacy  lock  consist- 
ing of  a  null  character;  if  no  privacy  lock  is  defined;  or  a 
one  character  type  followed  by  a  null  terminatated  string, 
if  a  orivacy  lock  is  defined.  If  the  lock  is  defined,  a 
lock  type  of  "s"  indicates  that  the  string  is  a  lock  string 
and  "p"  indicates  a  lock  data  base  procedure  name. 

B.  Area  Entries. 

The  area  entries  are  oreceded  by  a  two  byte  number  which 
is  the  number  of  areas  areas  and  a  two  byte  maximum  record 
size.   Each  entry  contains  the  follow i no  items. 

The  area  name  is  a  null  terminated  string.  The  tem- 
porary indicator  is  a  one  character  f 1 aa  which  is  equal  to 
one  for  temporary  areas  and  zero  otherwise. 
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Fourteen  data  base  procedure  names  are  stored  next.  The 
first  six  names  are  procedures  to  be  called  when  the  open 
functions  for  retrieval*  protected  retrieval,  exclusive 
retrieval/  update,  protected  update,  and  exclusive  update, 
are  executed  normally.  The  seventh  name  is  a  procedure  to 
be  executed  when  a  close  is  executed  normally.  The  final 
seven  names  are  procedures  corresponding  to  the  first  seven, 
but  which  are  executed  when  errors  occur.  If  a  procedure  is 
not  specified  for  a  function,  a  null  string  will  aopear  in 
the  file  at  the  appropriate  Dosition. 

Following  the  data  base  procedure  names  are  the  six 
privacy  lock  entries.  These  locks  have  the  same  format  as 
the  schema  privacy  lock.  The  six  locks  apply  to  the  open 
function  for  retrieval,  protected  retrieval,  exclusive 
retrieval,  update,  orotected  update  and  exclusive  update 
respect  i  ve 1 y . 

C.   Record  Entries. 

The  record  entries  include  information  aenerated  by  the 
member  subentries  of  the  schema's  set  entries  as  well  as 
information  from  the  schema's  record  entries.  The  record 
entries  are  preceded  bv  a  one  bvte  number  indicating  the 
number  of  record  tyoes  present.  Fach  record  entrv  contains 
the  following  data. 

The  record  name  is  a  null  terminated  strinq.  It  is  fol- 
lowed  by   a   two  byte  sianed  integer  indicatina  record  size 
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for  records  o*  this  entry  type.  A  one  byte  location  mode  is 
next.  Additional  location  mode  information  may  follow 
depending  on  the  mode:  for  modes  zero  and  seven,  no  addi- 
tional information;  for  mode  one,  a  one  byte  record  index 
and  a  one  byte  item  index;  for  modes  two  and  three,  two  or 
more  one  byte  item  index  numbers  preceded  by  a  one  byte 
number  indicating  the  number  of  indices  present;  for  modes 
four  and  five,  a  null  terminated  string  naming  a  data  base 
procedure  and  two  or  more  one  byte  item  indices  preceded  by 
a  one  byte  number  indicating  the  number  of  indices  present; 
and  for  mode  six,  a  one  byte  set  index.  The  location  mode 
infromation  is  derived  form  the  LOCATION  clause  of  the 
record  entry  and  the  encodina  matches  that  in  "rlocmod"  of  a 
schema  DBM  record  vector. 

Following  the  location  information  is  the  area  data 
derived  from  the  record  type's  WITHIN  clause.  This  consists 
of  a  one  byte  ootion  code  and  area  specifications.  The  area 
specification  format  deoends  on  the  ootion  code:  for  code 
zero,  a  one  byte  area  index;  for  code  one,  two  or  more  one 
bvte  area  indices  preceded  by  a  one  byte  number  indicating 
the  number  of  indices  present;  and  for  two,  no  further  data. 
The  encoding  of  the  WITHIN  information  matches  that  of 
"  r  a  r  e  a  "  in  the  schema  0 R M  record  vector. 

Fourteen  data  base  procedure  names  are  stored  next  in 
the  entries.  The  first  seven  are  procedures  to  be  called 
when  the  functions  of  "insert",  "remove",  "store",  "delete", 
"modify",  "find"  and  "get",  are    executed  normally  on  records 
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of  this  entry's  type.  The  second  seven  data  base  procedures 
are  called  when  any  of  the  above  seven  functions  is  executed 
and  an  error  occurs.  If  no  orocedure  is  defined  in  the 
schema  for  a  function,  a  null  string  replaces  the  function. 

Following   the   data   base   procedure  names  are       seven 

privacy   lock   entries.   These  locks  have  the  same  format  as 

the  schema  orivacv  lock.  The  locks  apply  to  the  seven  func- 
tions listed  in  the  previous  paragraph. 

1  .   Member  Dat  a . 

Each  record  entry  has  zero  or  more  set  membership 
entries  followina  the  record  orivacv  locks.  These  entries 
are  preceded  by  a  one  byte  number  indicatina  the  number  of 
membership  entries  present.  The  member  entries  for  each 
record  type  appear  in  the  same  order  as  the  set  entries  in 
the  schema  for  which  membership  is  defined.  The  contents  of 
each  membership  entry  is  as  follows. 

The  set  name  for  the  membership  is  stored  as  a'  null 
terminated  strina.  Following  the  set  name  is  a  two  byte 
series  of  flag  bytes  which  correspond  to  the  bits  of 
"mflags"  in  a  schema  D8M  member  vector.  The  information 
contained  in  these  bits  is  derived  from  the  MEMBER  clause, 
the  KEY  clause  and  the  total  number  of  SEARCH  clauses 
defined  in  the  schema. 

Next  is  a  one  byte  number  indicating  the  number  of 
items  included  in  the  primary  key  for  the  item.   This  number 
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is  at  most  16  and  is  zero  if  no  key  is  defined.  Following 
this  value  is  the  appropriate  number  of  Drimary  key  element 
pairs.  Each  key  pair  consists  of  a  one  byte  collatina  code 
and  an  item  index.  The  collatinq  code  is  zero  if  this  ele- 
ment of  the  Drimary  key  is  ascending  and  one  if  it  is  des- 
cendi  ng. 

Following  the  primary  key  soec i f i cat i on  are  up  to 
seven  search  key  strings  (Mie  exact  number  is  recorded  in 
the  flag  bytes  above).  Each  search  key  string  is  a  null 
terminated  string  of  the  item  indices  for  the  items  in  the 
search  key. 

i 

Following  the  search  key  strings  is  the  set  selec- 
tion data.  If  Format  2  of  the  SELECTION  clause  was  used/ 
this  data  consists  of  the  name  of  a  data  base  procedure.  If 
Format  1  was  used*  the  data  is  as  follows.  First  is  a  one 
byte  code  indicatina  the  root  selection  mode.  The  code 
corresponds  to  that  in  "mselflag"  of  a  schema  DBM  member 
vector.  The  remaining  root  selection  data  depends  on  the 
root  selection  mode.  For  mode  one  and  two*  a  one  byte  set 
index  follows  the  mode.  For  mode  three*  there  is  no  further 
root  selection  data.  For  mode  four*  the  data  is  a  null  tem- 
inated  string  of  two  byte  oairs  each  of  which  contains  a 
record  index  and  a  set  index. 

The  remaining  set  selection  data  for  Format  1  con- 
sists of  the  number  of  "THEN  THRU"  clauses  followed  bv  the 
aporooriate  number  of  two  byte  selection  oairs.    Each   pair 
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contains  a  set  index  and  an  item  index.   These  pairs  are  the 
source  data  for  "mssel"  in  a  schema  DBM  member  vector. 

Following  the  set  selection  information  are  six  data 
base  Drocedure  names.  The  first  three  Drocedures  are  called 
when  the  functions  of  "insert",  "remove"  and  "find"  are  exe- 
cuted normally.  The  second  three  procedures  are  called  when 
these  same  functions  are  executed  and  an  error  results.  If 
no  data  base  Drocedure  is  defined  for  a  Darticular  function, 
a  null  string  will  appear  in  its  position. 

Following  the  data  base  procedure  names  are  three 
privacy  lock  entries  in  the  same  format  as  the  schema  lock. 
These  privacy  locks  are  for  the  functions  described  in  the 
previous  paraqraph. 

2  .   I  tern  Dat  a . 

Each  record  entry  has  one  or  more  item  descriotion 
entries  following  the  set  membership  entries.  The  item 
entries  are  preceded  by  a  one  byte  number  indicating  the 
number  of  items  present.  The  item  entries  are  stored  in  the 
same  order  that  the  items  they  represent  appear  in  the 
record  tvoe  being  described.  The  contents  of  each  item 
entry  is  as  follows. 

The  first  data  in  an  item  entry  is  the  name  of  the 
item  stored  as  a  null  terminated  string.  If  the  item  is  one 
that  is  not  generated  by  an  item  sub-entrv  in  the  schema, 
the   item  name  will  be  a  null  strina.   Followina  the  name  is 
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a  one  byte  level  number.   A  level  number  between  one  and  100 

is   generated   bv   a  schema  item  sub-entry;  101  is  a  forward 

link;  102  is  a  backward  link;  103  is  a  link  to  owner;  10a  is 

an   owner's   link  to  first  member;  105  is  an  owner's  link  to 

last  member;  106  is  an  owner's  link  to  index  and   107   is   a 

CALC  synonym  link. 

\ 
The  remaininq  data  depends  on  the  level  number.    If 

the  level  number  is  between  one  and  100,  inclusive*  the  next 

byte  contains  the  data  tyDe;  between  101  and  106,  inclusive, 

the  next  byte-  is  the  set  index;  and  for  107  no  other  data  is 

needed.   If  a  picture  is  defined  for  the  item,  it  is   stored 

next  as  a  null  terminated  strina. 

Following  the  level  and  tyoe  data  is  a  validity 
checkinq  description.  The  validity  checking  description  is 
a  null  terminated  strina  which  is  encoded  to  fit  the 
requirements  of  "icheck"  in  a  schema  OBM  item  vector.  It  is 
generated  bv  the  CHECK  clause  of  a  schema  item  sub-entry. 
If  no  validity  check  is  defined  for  the  item,  the  strinq  is 
null. 

Followinq  the  validity  check  description  are  three, 
two  byte  numbers  represent  inq  the  size  (in  bytes)  of  one 
occurence  of  the  item;  the  number  of  occurences  of  the  item 
in  a  record;  and  the  startinq  byte  number  of  the  item  within 
the  record. 

The  names  of  six  data  base  Drocedures  are  next.  The 
first   three   are   names  of  procedures  to  be  called  when  the 
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functions  of  "store",  "get",  and  "modify"  are  executed  nor- 
mally. The  second  three  procedures  are  called  when  these 
functions  are  executed  and  an  error  occurs.  If  a  procedure 
is  not  defined  for  a  function,  a  null  string  will  aooear  in 
its  pi  ace. 

Following  the  data  base  procedure  names  are  three 
privacy  lock  entries  in  the  same  format  as  the  schema  lock. 
These  locks  apply  to  the  functions  mentioned  in  the  previous 
paragraph . 

D.   Set  Entries. 

Following  the  record  entries  are  the  set  entries.  These 
entries  are  generated  by  set  sub-entries  in  the  schema.  The 
entries  are  preceded  by  a  one  byte  number  indicating  the 
number  of  sets  defined.  The  contents  of  each  entry  is  as 
follows. 


The  first  element  in  each  entry  is  the  set  name  stored 
as  a  null  terminated  string.  The  set  name  is  followed  by  a 
one  byte  code  which  corresponds  to  the  lower  order  byte  of 
"sflags"  in  a  schema  DBM  vector  and  describes  OWNER,  SET  IS 
and  ORDER   clauses  of  the  set  sub-entry. 

Next  is  a  pair  of  bytes  indicating  the  owner  record. 
The  first  byte  is  the  owner  record's  index  and  the  second  is 
the  item  index  of  the  first  item  in  the  owner  record  having 
to  do  with  the  set.  Followinq  t^e  owner  record  data  is  a 
one  byte  number  inoicatina  the  number  o  *  member  records   and 
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three  byte  member  descriptions  oresent.  Each  member 
description  consists  of  the  record  index  of  the  member;  the 
index  of  the  set  membership  vector  in  the  schema  D8M  record 
vector  for  the  record;  and  the  item  index  of  the  first  item 
in  the  record  having  to  do  with  this  set.  The  order  of  the 
member  descriptions  is  alphabetical  by  member  record  name. 

Following  the  member  descriptions  are  four  data  base 
procedure  names  stored  as  null  terminated  strings.  The 
first  two  are  names  of  procedures  to  be  called  when  the 
"insert"  or  "remove"  functions  are  executed  normally.  The 
last  two  represent  the  same  functions,  but  are  called  when 
an  error  occurs.  If  a  procedure  is  not  defined  in  the 
schema  for  a  function?  a  null  string  appears  in  that  place. 

Following  the  data  base  procedure  names  are  three 
privacy  locks.  These  locks  are  of  the  usual  format.  They 
lock  the  functions  of  "insert"*  "remove"  and  "find",  respec- 
tively. 
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APPENDIX  E.   INTERPROCESS  MESSAGE  FORMATS. 

A.   Messages  Received  bv  the  Schema  DBM. 

Messages  received  are  read  into  "smesin",  a  character 
buffer  of  lenath  512.  The  first  byte  of  the  messaae  is  a 
function  code.  The  remainder  of  the  messaqe  will  vary 
depending  on  the  function  code.  The  message  is  terminated 
by  a  mark,  which  is  ten  bvtes  of  the  octal  code  0252.  In 
the  message  descriptions  that  follow,  the  function  code  is 
included  as  cart  of  the  description  heading. 

1.   Initial  Call  wessaae  (Code  0). 

The  initial  call  is  made  by  the  user  to  reauest 
validation  of  his  sub-schema  and  to  establish  his  access 
permissions.  Immediately  followinq  the  function  code  is  a 
null  terminated  string  containinq  the  schema  name.  After 
the  schema  name  is  a  null  terminated  string  containing  the 
privacy  key  for  the  schema.  Following  the  schema  entries 
are  the  area  entries. 

a .   A  rea    Entries. 

The  area  entries  are  preceded  bv  a  one  byte 
number  indicating  the  number  of  areas  in  the  sub-schema. 
Each  entry  consists  of  seven  null  terminated  strinas.  The 
first   string   is   the   area  name,  the  other  six  strings  are 
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privacy  keys  and  mav  be   null  strings.    The   orivacy   keys 

specified  are    for  retrieval,  protected  retrieval,  exclusive 

retrieval,  update,  Drotected  uodate   and   exclusive   update 
resoec  t  i  ve 1 y . 

b.   Record  Ent  ries. 

The  record  entries  are  Dreceded  by  a  one  byte 
number  indicating  the  number  of  records  in  the  sub-schema. 
Each  entry  consists  of  seven  null  terminated  strinas,  an 
encoded  member  list  and  an  encoded  item  list.  The  first 
string  is  the  record  name  and  the  remaining  six  are  orivacy 
keys  and  may  be  null  strings.  The  privacy  keys  are  for 
insert,  remove,  store,  delete,  modify  and  find  respectively. 

An  encoded  member  list  is  headed  by  a  one  byte 
number  indicatina  how  many  member  entries  follow.  Each 
member  entry  consists  of  four  null  terminated  strinas.  The 
first  string  is  the  name  of  the  set  and  the  remaining  three 
are  orivacy  keys  and  mav  be  null.  The  privacy  keys  are  for 
insert,  remove  and  find  resoec t i ve 1 y  . 

An  encoded  item  list  is  headed  bv  a  one  bvte 
number  indicating  how  many  item  entries  are  in  the  list. 
Each  entry  has  a  one  byte  entry  code  followed  by  four  null 
terminated  strings.  The  first  string  is  the  item  name  and 
the  rest  are  orivacy  keys  and  may  be  null.  The  privacy  keys 
are  for  store,  get  and  modify.  The  remainder  of  the  item 
entry  varies  depending  on  the  entry  code  specified  below. 
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(1)  Atomic  Item  (Code  0).  No  further  fields 
exist  in  the  item  entry  for  an  atomic  item. 

(2)  Vector  (Code  I).  A  one  byte  number  indi- 
cating the  number  of  occurences  of  the  item  follows  the 
or  i  vacy  keys • 

(3)  Repeatina  Grouo  (Code  2).  A  pair  of  one 
byte  numbers  follows  the  privacy  keys.  The  first  number 
indicates  the  number  of  subseauent  item  entries  in  the 
repeating  grouo  and  the  second  indicates  the  number  of 
occurences  in  the  grouo. 

c .   Set  Ent  r i  es  . 

The  set  entries  follow  the  record  entries.  They 
are  preceded  by  a  one  byte  number  indicating  the  number  of 
set  entries.  Each  set  entry  consists  of  four  null  ter- 
minated strings.  The  first  strina  is  the  set  name  and  the 
remaining  three  are  Drivacy  keys  and  may  be  null.  The 
privacy  keys  are  specified  for  insert*  remove  and  find. 

2.   Ooen  Message  (Code  Q)  . 

The  function  code  is  followed  by  a  one  byte  mode* 
which  uses  the  same  encodina  as  the  C  dbooen  function  (see 
Appendix  A,  Section  B.2.0).  The  remainder  of  the  message 
consists  of  one  byte  area  index  numbers.  Mo  area  numbers 
should  be  included  if  everv  area  known  to  the  subschema  is 
t o  be  ooened . 
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3.   Close  Messaqe  (Code  1). 

The  function  code  is  followed  by  a  list  of  one  byte 
area  index  numbers.  No  area  numbers  are  included  in  the 
message  if  all  open  areas  are    to  be  closed. 

a.   Find  Message  (Code  3). 

The  function  code  is  followed  bv  a  one  byte  selec- 
tion tyoe  and  selection  codes.  The  possible  selection  tynes 
and  their  corresponding  record  selection  codes  are: 

a .  Expl i  c  i  t  Key  . 

Code  zero  indicates  direct  access  and  the  selec- 
tion code  will  be  a  four  byte  data  base  key. 

b.  Owner  Record. 

Code  one  indicates  selection  of  the  owner  record 
for  the  set  of  the  specified  type  that  the  soecified  record 
belonqs  to.  The  first  selection  code  is  a  one  byte  set 
index  and  the  second  is  a  four  byte  data  base  key. 

c .  Relative  A  rea  . 

Tyoe  code  two  specifies  relative  selection  in 
the  designated  area.  The  first  selection  code  is  a  one  byte 
criterion  with  zero,  one,  two  and  three  meaning  next?  previ- 
ous* first  and  last*  resDectivelv;  and  four,  five,  six  and 
seven  meaning  next,  previous,  first  and  last  of  a  specified 
record   tyoe.    If   the  criterion  is  four  throuah  seven,  the 
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second  selection  code  is  a  one  byte  record  index.   The   last 
selection  code  is  a  four  byte  data  base  key. 

d.  Rel at  i  ve  Set . 

Type  code  three  specifies  relative  selection  in 
the  designated  set.  The  first  selection  code  is  a  one  byte 
set  index.  The  remaining  selection  codes  are  identical  to 
those  for  tyoe  code  two. 

e.  CALC  Key. 

Type  code  four  indicates  hash  key  selection. 
The  first  selection  code  is  a  one  byte  record  index.  The 
remaining  codes  are  item  triples  for  all  the  items  of  the 
specified  record  tyoe  which  are  know  to  the  sub-schema  and 
are  not  associated  with  an  "OCCURS"  clause.  An  item  triple 
consists  of  an  item  specification,  a  one  byte  data  type 
code,  and  a  data  value  of  the  specified  type.  An  item 
specification  code  consists  of  a  one  byte  item  index  fol- 
lowed by  zero  c  more  one  byte  subscript  values  as  appropri- 
ate. The  data  tvpe  codes  are  one  for  inteaer,  two  for  sin- 
gle precision  floating  point,  three  for  double  Drecision 
floating  point,  four  for  null  terminated  string  and  five  for 
dat  a  base  key . 

f  .   Dupl icate  CALC  Key. 

Type  code  five  indicates  selection  of  the  next 
record  with  a  hash  key  duplicating  the  hash  key  of  the 
specified  record.   The  selection  code  is  a   four   byte   data 
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base  key. 

g.   Current  Set  Data  Value. 

Tvpe  code  six  indicates  that  the  first  record  of 
the  specified  type  which  matches  the  specified  item  values 
in  the  specified  set  occurence  is  to  be  selected.  The  first 
selection  code  is  the  one  byte  record  type  index  of  the 
record  type  to  be  selected.  The  second  selection  code  is  a 
one  byte  set  type  index.  The  third  selection  code  is  a  four 
byte  data  base  key  of  a  record  which  belongs  to  the  set 
occurence  to  be  scanned.  The  remaining  selection  codes  are 
zero  or  more  item  trioles. 

h.   Selected  Set  Data  Value. 

Type  code  seven  indicates  that  the  first  record 
of  the  specified  type  which  matches  the  specified  item 
values  in  the  set  occurence  selected  through  the  SDecifed 
record  tyoe's  member  subentry  "SELECTION"  clause  is 
selected.  The  first  two  selection  codes  are  identical  to 
those  in  the  orecedina  oaraaraph. 

The  third  selection  code  is  a  one  byte  number 
which  indicates  the  number  of  path  selection  codes  which 
follow.  The  oath  selection  codes  are  item  auadrudes.  An 
item  quadruple  consists  of  a  one  bvte  record  index,  an  item 
SDec i f i cat i on ,  a  one  byte  data  type  code*  and  a  data  value 
of  the  tvoe  specified  in  the  data  type  code.  The  remainina 
selection  codes  are  zero  or  more  item  triples  for   items   in 
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the  specified  record  t 
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i.   Current  Set  Duolicate  Value. 

Type  code  eight  indicates  that  the  record  to  be 
selected  is  the  next  record  (if  any)  which  is  of  the  same 
type  as  the  soecified  record;  in  the  set  of  the  specified 
type;  and  matches  the  specified  record  in  the  specified 
items.  The  first  selection  code  is  a  four  byte  data  base 
key.  The  second  selection  code  is  a  set  type  index.  The 
remaining  selection  codes  are    item  specifications. 

5.  Get  Messaae  (Code  5). 

Following  the  function  code  is  a  four  byte  data  base 
key.  The  remainder  of  the  messaae  consists  of  item  doubles 
for  the  items  the  user  program  desires.  An  item  double  con- 
sists of  an  item  specification  and  a  one  byte  data  tyoe.  An 
omitted  subscript  in  a  data  specification  means  every 
occurence  of  the  vector  or  repeating  group  is  desired.  A 
double  for  a  repeating  arouo  has  a  data  type  of  zero.  The 
doubles  for  the  elements  in  the  repeating  arouo  must  be 
immediately  following  the  reoeatina  arouo's  double. 

6.  Store  Message  (Code  12). 

The  remainder  of  the  messaae  is  a  one  bvte  record 
t  ype  i  ndex . 
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7.  Resoonse  to  Request  for  Data  Message  (Code  100). 

This  message  crovides  data  requested  in  a  Reauest 
for  Data  Messaoe  sent  by  the  schema  DBM..  The  function  code 
is  followed  by  an  item  specification  for  each  requested  item 
consisting  of  a  one  byte  record  tyoe  index*  a  one  byte  item 
index  and  a  one  bvte  data  tyoe.  The  requested  data  follows 
the  item  tyoe  SDec i f i cat i ons  in  exactly  the  same  order  as  in 
the  Request  for  Data.  For  aroup  items?  the  order  of  aooear- 
ance  of  the  subordinate  items  of  the  qrouo  in  the  item 
SDec i f i cat i ons  is  the  order  the  items  must  aopear  within 
each  occurence  of  the  qrouo  item  in  the  data  portion  of  the 
messaqe.  The  other  data  may  be  area  or  set  type  indices  or 
data  base  keys . 

8.  Insert  Messaoe  (Code  7). 

The  function  code  is  followed  by  a  four  byte  data 
base  key  of  the  record  to  be  inserted.  The  remainder  of  the 
messaqe  consists  of  set  specifier  oai  rs.  A  set  specifier 
pair  consists  of  a  one  byte  set  tyoe  index  followed  by  a 
four  byte  data  base  key  indication  the  current  record  in  the 
current  occurence  o*  the  set  of  the  type  specified. 

9.  Remove  Messaqe  (Code  tl). 

Following  the  function  code  is  a  four  byte  data  base 
key.  The  remainder  of  the  messaoe  is  zero  or  more  one  byte 
set  type  indices. 
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10.  Modify  Messaqe  (Code  8). 

Following  the  function  code  is  a  one  byte  number 
indicating  how  many  set  membershios  are  being  modified,  fol- 
lowed by  the  required  number  of  set  specifier  oairs.  The 
remainder  of  the  message  consists  of  zero  or  more  item  tri- 
ples. 

11.  Delete  Messaae  (Code  2). 

Followina  the  function  code  is  a  four  bvte  data 
base  key.  The  remainder  of  the  message  is  a  one  byte  dele- 
tion code  with  the  same  values  as  for  the  parameter  of  the  C 
"delete"  function  (see  ApDendix  A,  Section  B.2.i). 

B.   Messages  Transmitted  by  the  Schema  D8M. 

Messages  transmitted  are  in  response  to  messaaes 
received  and  fall  into  two  categories:  normal  responses  and 
error  messages.  The  first  byte  of  the  message  is  a  response 
code  and  is  zero  for  normal  responses  and  eaual  to  the  error 
code  for  error  messages.  The  format  of  the  responses,  after 
the  first  byte,  varies  deoendina  on  the  previously  received 
message  (for  normal  resoonses)  or  on  the  error  type  (for 
error  messaaes).   These  formats  are    detailed  below. 

1.   Normal  Responses. 
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a.   Ini  t  i  a)     Call  M 


essaqe . 


The  normal  resDonse  to  an  initial  call  is  an 
encoded  schema  description  for  thp  user.  This  descriotion 
is  a  series  of  one  byte  numbers  each  of  which  represents  the 
index  number  associated  with  the  data  base  names  (except  the 
schema  name)  in  the  Initial  Call  Message.  For  areas*  this 
is  the  index  number  of  the  area?  for  records*  the  index 
number  of  the  record  followed  by  the  index  number  of  each 
data  item  or  data  aaqreqate?  for  sets*  the  set  number. 

b.   Find  Message. 

The  normal  response  to  a  Find  Message  contains 
the  information  necessary  to  establish  currency  for  the 
selected  record.  This  information  consists  of  a  four  byte 
data  base  key?  a  one  byte  record  tyoe  index  indicatinq  the 
type  of  the  record?  and  zero  or  more  one  byte  set  tvoe 
indices  indicatinq  the  set  types  for  the  set  occurences*  in 
which  the  record  participates. 

c  .   Store  Message . 

The  normal  response  to  a  Store  Messaae  is  a 
Request  for  Data  Messaae.  This  message  requests  the  data 
needed  to  perform  the  store  function.  The  message  is  com- 
posed of  reauest  entries  each  prefixed  by  a  one  byte  request 
tvpe  code.  The  request  entry  formats  are  listed  below  alonq 
with  their  reauest  tyoe  codes. 


iao 


(1)  Data  Item  Reauest  (Code  0).  This  reauest 
entry  consists  of  an  item  soec i f i cat i on  .  A  reDeatinq  grouD 
index  is  an  implied  request  for  all  subordinate  elements  in 
the  repeatinq  group  known  to  the  sub-schema.  If  a  subscript 
is  missinq/  the  data  in  all  occurences  of  the  relevant  ele- 
ment is  requested. 

(2)  Area  Index  Request  (Code  1).  This  request 
consists  of  the  reauest  code  alone.  It  reauests  the  con- 
tents of  "areaid"  in  the  user  oroqram. 

(3)  Data  Base  Key  (Code  2).  This  request  con- 
sists of  the  request  code  alone.  It  requests  the  data  base 
key  associated  with  "keyname"  in  the  user  program. 

(4)  Current  of  Set.  This  request  consists  of  a 
set  type  index.  It  requests  the  data  base  key  of  the 
current  record  of  the  soecified  set  type. 

d.  Request  for  Data  Message. 

The  normal  response  to  a  Request  for  Data  is 
identical  to  that  for  a  find.  The  record  information  passed 
is  for  the  record  just  stored. 

e.  Messaaes  with  a  Null  Response. 

The  normal  response  to  certain  messages  is  a 
response  code  only.   These  messages  are    Ooen,  Close*  Insert/ 
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Modify*  Remove  and  Delete. 
2.   Error  Messaaes. 

a.  Invalid  Sub-schema  (Code  60). 

This  resoonse  is  the  result  of  a  mismatch 
between  the  schema  and  the  sub-schema  oresented  in  the  user 
program's  Initial  Call  Messaqe.  After  the  error  code  is  a 
one  byte  first  error  tvpe:  zero  for  schema  entry;  one  for 
area  entry?  two  for  record  entry?  and  three  for  set  entry. 
Following  this  code  are  three*  one  byte  entries  giving  the 
number  of  the  first  erroneous  entry  in  area*  record  and  set 
entries  resoectivelv. 

b.  Area  Already  Ooen  (Code  28). 

This  response  to  an  Open  Message  has  a  one  byte 
error  count  following  the  response  code. 

C.   Truncation  of  Data  (Code  5a). 

This  response  to  a  Get  Message  has  identical 
format  to  the  normal  response  to  a  Get  Messaoe  except  for 
the  response  code. 

d.   Messaaes  with  Error  Code   Only.    The   remaining 
error  responses  consist  of  an  error  code  only  as  follows. 

(1)   Data  Base  Key  Invalid  (Code  2). 
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a). 


(Code  5). 


(Code  9). 


14). 


(2)   Data  Items  Invalid   or   Inconsistent   (Code 


(3)   Violation  of  DUPLICATES  NOT  ALLOWED   clause 


(4)   End  of  Set  or  Area  (Code  7). 


(5)   Invalid  Record  or  Set  Index  (Code  8) 


(fe)   Attempted  Uodate   on   Retrieval   Only   Area 


(7)   Privacy  Breach  Attempted  (Code  10). 


(8)   Media  Space  not  Available  (Code  11) 


(R)   Data  Base  Key  not  Available  (Code  12). 


(10)   Insert  into  Mandatory  Automatic  Set   (Code 


(11)   Remove  out  of  Mandatory  Set  (Code  15). 


(12)   Insert  into  Set  with   Existing   Membershio 


(Code  16). 


(13)   Implicitly  Referenced  Area   not   Available 


(Code  18). 


(14)   Affected  Area  not  Ooen  (Code  21). 
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(15)  Illegal  Area  Index  (Code  23). 

(16)  Set  Occurence   would   Scan   Temporary   and 
Permanent  Areas  (Code  24). 

(17)  No  Set  Occurence  Satisfies  Specified  Argu- 
ment s  (Code  2S ) . 

(18)  Mo   Record   Satisfies   Record    Selection 
Expression  (Code  26). 

(19)  CHECK  Clause  Violated  (Code  27). 


(20)   Usaae  Mode  Conflict  with   Other   Processes 


(Code  29). 


(21)  Unaualified  DELETE  on  Owner  of  a  Non-empty 
Set  (Code  30). 

(22)  No  Initial  Call  Messaae  (Code  b\). 

(23)  Indecipherable  or   Unoroccessab 1 e   Message 
(Code  100). 
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APPENDIX  F.  DBM  SKELETON  PROGRAM. 

The  skeleton  program  is  identical  for  every  schema  DBM. 
Compiled  schema  DBM's  differ  only  in  the  values  associated 
with  certain  DEFINE'd  constants  controlling  array  sizes*  in 
the  initialization  of  certain  arrays/  and  in  the  data  base 
procedures  which  are  included  in  the  compiled  version.  when 
the  schema  DBM  is  executed*  it  initializes  its  tables  from 
the  schema  description  in  the  Schema  Description  File. 
These  tables  drive  the  Drocessing  of  the  data  base.  The 
schema  DBM  concurrently  reads  the  user's  sub-schema  descrip- 
tion from  the  interprocess  communication  pioe.  The'  sub- 
schema description  is  validated  and  index  numbers  are  pro- 
duced to  allow  translation  of  user  requests  and  data  into 
svstem  requests  and  data. 

This  ApDendix  describes  the  data  organization  of  the 
skeleton.  Documentation  for  each  service  routine  and  util- 
ity routine  is  contained  in  the  source  program  listings. 
Listings  and  machine  readable  source  of  the  DBM  skeleton  can 
be  ootained  by  contacting  the  Department  of  Computer  Science 
(Code  52Rs).  For  an  explanation  of  the  values  associated 
with  DEFINE'd  constants  mentioned  in  this  Appendix  see 
ApDendix  H.  For  a  description  of  a  user's  view  of  the  ser- 
vice routines*  see  ApDendix  A. 
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A.   General  Tables. 

Certain  tables  and  buffers  are  available  for  use  by 
other  t abl es  . 

1.  The  character  buffer,  "scharbuf" ,  is  dimensioned  by  a 
DEFINE'd  constant.  It  is  used  to  stored  character  strings 
and  other  relatively  short  variable  length  data. 

2.  "Procname"  is  an  array  of  strings  containing  the 
names  of  the  data  base  procedures.  "Procpoint"  is  an  array 
of  function  pointers  pointing  to  the  functions  defined  in 
"orocname" .  These  arrays  are  used  to  set  uo  the  data  base 
procedure  pointers  used  in  other  tables.  Both  "procname'1 
and  "procpoint"  are  dimensioned  by  a  DEFINE'd  constant. 

3.  The  orivacy  vector  array  (  M  p  v  e  c  "  )  ,  dimensioned   by   a 

DEFINE'd   constant/  is  used  for  data  item  and  data  agaregate 

privacy  information.   Its  elements  are       structures   of   type 

"privect".   The  format  of  a  "orivect"  structure  is 

struct  orivect  ( 

char  otype?       //  type  of  privacy  lock 
char  *olock;     //  pointer  t o  privacy  lock 
> 

The  "ptype"  code  is  "s"  if  the  "plock"  Dointer  ooints   to   a 

string,  and  "o"r  if  a  data  base  procedure  is  indicated. 

a.  The  record  buffer  array,  "srecbuf"*  contains  all  the 
record  buffers  for  areas,  records  and  sets. 
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B.   Organization  for  Area  Management • 


The  schema  DBM  contains  an  array  of  structues  of  tvpe 
"areavect " f  dimensioned  by  a  DEFINE'd  constant*  called 
"avec".  Each  structure  in  the  array  is  used  to  describe  an 
area  in  the  data  base.  The  format  of  an  "areavect"  struc- 
ture is  as  foil ows : 

struct  areavect  { 

int  aflags;      //  see  below 

int  ause?        //  usage  count  of  last  reference 
char  *adaf aoat h ; //  oointer  to  path  to  data  file 
int  adatades?    //  file  descriptor  for  data  file 
char  *akeypath;  //  oointer  to  oath  to  key  file 
int  akevdes?     //  file  descriptor  for  key  file 
char  acrecloct31?    //  location  of  current  record 
char  *acurrec?   //  pointer  to  current  rec  buffer 
int  aeurkey[21?  //  first  kev  #  in  current  key  buff 
char  akeybuf  1768]  ; //  buffer  for  db  key  maDpings 
int  ( *provec ) C ) t 1 4] ; //  pointers  to  db  procedures 
int  aDflags;     //  oermission  flaqs  for  functions 
int  awaste?      //  current  waste  count 
> 

"Aflags"  is  formed  by  a  bit-wise   OR   of   the   following 

oc t a  1  codes : 

The  area  is  temoorary 

The  files  are  physically  open 

Current  key  block  modified 

Current  record  modified 

Key  b 1 oc  k  is  valid 

CRECVAL    02000      Record  buffer  is  valid 

CRECSIZ    01000      Current  record  has  increased  size 

KNOWN       0400      Area  is  known  to  the  sub-schema 

RETRV  01       Area  open  for  retrieval 

PRETRV         02      Area  ooen  for  protected  retrieval 

ERETRV        03      Area  open  for  exclusive  retrieval 


TEMP  0100000 
PHSOP  040000 
KErBMOD  020000 
CRECMOD  010000 
KEYBVAL    04000 
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UPDT 

04 

PUPOT 

05 

EUPOT 

06 

Area  open  for  update 

Area  open  for  protected  update 

Area  open  for  exclusive  update 

The  "aDflags"  are    set  when  the  user  sub-schema  is   vali- 
dated.   These  flags  indicate  the  functions  allowed  the  user 
for  the  area    as  follows: 
0100000      Retrieval 
040000      Protected  retrieval 
020000      Exclusive  retrieval 
010000      Update 
04000      Protected  uodate 
02000      Exclusive  uodate 

C.   Logical  Usaqe  Block. 

The  logical  usaqe  block  records  the  current  usage  mode 
for  each  area  in  the  data  base  currently  being  used  by  any 
schema  DBM.  The  loaical  usaae  block  is  organized  into  two 
bvte  integer  entries*  one  for  each  area  in  the  data  base. 
Each  two  byte  entry  is  divided  into  four  fields:  bit  15  is 
the  exclusive  use  bit?  bits  14  through  ten  form  a  count  of 
retrievers;  bits  nine  through  fiver  a  count  of  Drotected 
retrievers;  and  bits  four  through  zero  form  a  count  of 
uDdaters.  The  loaical  usaae  block  can  record  uo  to  31  users 
in  each  category.  If  a  schema  DBM  has  an  area  ooen  for  a 
protected  mode*  the  undater  count  is  set  to  31. 
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D.   Organization  for  Record  Management. 

The  schema  0BM  contains  an  array  of  structures  of  type 
"recvector".  Each  element  of  the  array  is  the  record  vector 
for  a  record  type  defined  in  the  schema.  The  format  of  a 
"recvector"  structure  is: 

struct  recvector  { 

int  rflags;      //  see  below 

char  rlocmod(31;  //  see  below 

char  rarea[3]*   //  see  below 

int  (*pprovec) () (1^1 ?  //  pointers  to  db  procedures 

char  rnumsets?   //  number  of  set  types  for  record 

struct  member  *rsets?    //  oointer  to  member  vectors 

char  rnumitem?   //  number  of  items  in  record 

struct  itemvect  *ritems;  //  pointer  to  item  vectors 

int  roflags;     //  permission  flags  for  functions 

char  *rcurrec?   //  pointer  to  current  record  buffer 

> 

"Rflags"  is  currently  used  only  to  indicate  whether   or   not 

the   sub-schema  knows  about  the  record  type.   The  octal  code 

KNOWN  (0400)  is  used  for  this  function. 

"Rlocmod"  is  derived  from  the  LOCATION  clause  of  the 
schema  RECORD  entry  (see  Section  3.3.4.  of  Ref.  2)  and  is 
interpreted  as  follows.  Character  zero  gives  the  location 
mode:  zero  for  DIRECT  with  key  passed  as  a  parameter;  one 
for  DIRECT  with  key  stored  in  a  record;  two  for  CALC  using 
the  standard  key  transformation  with  no  duplicates?  three 
for  CALC  using  the  standard  key  transformation  with  dupli- 
cates allowed?  four  for  CALC  using  a  data  base  procedure 
with  no  duplicates?  five  for  CALC  using  a  data  base  pro- 
cedure with  duplicates  allowed?  six  for  VIA  a  set?  and  seven 
for  SYSTEM  mode.  The  last  two  bytes  in  the  "rlocmod"  vary 
in   meaning  depending  on  the  mode.   For  mode  zero*  bytes  one 
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and  two  are  unused.  For  mode  one*  byte  one  contains  the 
record  type  and  byte  two*  the  item  index  of  the  data  item 
which  holds  the  data  base  key.  For  modes  two  throuqh  five* 
bytes  one  and  two  hold  a  pointer  to  the  randomizing  key 
description.  A  randomizing  description  is  a  null  terminated 
series  of  bytes*  the  first  containina  the  item  index  of  the 
key  link  item  and  subsequent  bytes  containing  the  item 
indices  of  the  fields  of  the  randomizing  key.  For  modes 
four  and  five*  the  randomizing  key  descriDtion  is  headed  by 
a  pointer  to  the  aoprooriate  data  base  Drocedure.  For  mode 
six*  byte  one  contains  the  set  index  of  the  set  to  be  con- 
sulted and  byte  two  is  unused. 

"Rarea"  is  derived  from  the  WITHIN  clause  of  the  schema 
RECORD  entry  and  is  formatted  as  follows.  Byte  zero  is  the 
wITHIN  option  code  and  has  the  following  interpretation: 
zero*  all  records  are  within  a  sinale  area*  one*  multiple 
areas  are  possible  (selected  by  a  user  input  value);  two* 
the  area  will  be  fhe  area  of  the  owner  of  the  set  of  a 
specified  type  in  whicn  the  record  oart i c i pat es  .  The  values 
of  bytes  one  and  two  of  "rarea"  vary  deoendina  on  the  WITHIN 
option:  for  zero  and  two*  byte  one  contains  an  area  index 
number  and  byte  two  is  unused*  for  one*  bytes  one  and  two 
contain  a  pointer  to  a  WITHIN  criteria.  A  WITHIN  criteria 
is  a  null  terminated  strina  of  bytes  each  containing  one  of 
the  allowed  area    numbers  for  this  record. 
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1.   Member  Vectors. 

Each  record  vector  contains  a  pointer   ("rsets")   to 

an  array  of  set  membership  vectors  if  it  oa^t i c i pat es  in  any 

sets.   A  set   membership   vector   is  a   structure   of   type 

"member".   The  format  of  the  "member"  structure  is: 


struct  member  { 

char  mset  num; 
i  nt  mf 1 ags } 
int  morder; 


char  *mokey? 
char  **mskey J 


//  set  index  for  this  entry 

//  see  be  1 ow 

/*  flag  bits  for  key  item 

0  =  ascending,  1  =  descendina*/ 

//  pointer  to  items  for  prime  key 

//  oointer  to  SEARCH  index  pointers 
char  mselflag;   //  root  selection  flag 
int  *mselid;     //  pointer  to  root  selection  id 
int  (*mssel)H;  //  oointer  to  set  selection  soec 
int  ( *morovec ) ( )  [6]  ; //  Dointer  to  db  procedures 
char  molfags,    //  permission  flags  for  functions 
> 


The  "mflags"  for  a  member  entry  is  formed  by  a   bit" 

wise  OR  of  the  followina  codes: 

MMAND    0100000  Membershio  is  mandatory 

MAUT0     040000  Membership  is  automatic 

MLINK     020000  Member  is  linked  to  owner 

MSSEL     010000  Set  selection  by  db  Drocedure 

MPKEY      04000  Primary  key  is  defined 

MPRKEY     02000  RANGE  ootion 

MPFKEY     01000  Duplicates  first 

MPLKEY      0400  Duplicates  last 

MPDKEY      0200  Duplicates  arbitrary 

MPNKEY      0100  Nulls  allowed 

MKNOWM       040  Membership  is  known  to  sub-schema 

Whenever  anv    of  MPRKEY  through  MPNKEY  are    set,  MPKEY  must  be 


151 


set.  Additionally/  the  low  order  three  bits  of  "mflags" 
contain  the  number  of  secondary  indices  defined  to  support 
SEARCH  keys  for  this  membership. 

The  pointer  "mskey"  points  to  a  strinq  of  pointers 
dimensioned  by  the  count  stored  in  "mflags".  Each  pointer 
points  to  a  null  terminated  strinq.  The  first  byte  of  this 
strinq  indicates  whether  duplicates  are  allowed?  the  second 
byte  is  the  item  index  for  the  owner  record  item  linkinq  the 
search  index?  and  the  remaininq  bytes  are  item  indices/  each 
representing  a  field  in  the  SEARCH  key  for  the  search  index. 
Duplicates  are  allowed  if  the  first  byte  of  the  string  is  a 
one  and  not  allowed  if  it  is  a  two. 

"Mpflaqs"  is  set  when  the  sub-schema   is   validated. 
These   flags  indicate  the  functions  allowed  the  user  for  the 
record/set  pair  as  follows: 
0200     Insert 
0100     Remove 
040     Find. 

When  "mflags"  has  MSSEL  set/  "mssel"  is  a  pointer  to 
a  data  base  procedure  for  set  selection  and  "mselflag"  and 
"mselid"  are  unused.  If  MSSEL  is  not  set/  "mselflaa"  is  a 
code  describinq  the  root  set  selection  in  the  set  selection 
chain  for  this  member.  The  possible  values  of  "mselflag" 
are  as  follows:  one  for  singular  sets?  2  for  current  of  set 
type?  three  for  throuah  data  base  key?  and  four  for  through 
CALC   key.    The   data   in   "mselid"  depends  on  the  value  of 
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"mse 1 f 1 ag" .  For  sinqular  sets  and  current  of  set  selection, 
Mmsel id"  contains  the  set  index  of  the  root  set.  For  selec- 
tion by  data  base  key,  "mselid"  is  not  used.  For  selection 
by  CALC  key,  "mselid"  is  a  oointer  to  a  null  terminated 
string  of  character  pairs  which  are  the  record  and  item 
indices  of  the  items  to  be  used  in  forming  the  CALC  key. 

When  set  selection  is  not  by  a  data  base  procedure 
and  the  number  of  THRU  clauses  in  the  SELECTION  clause  for 
this  member  entry  is  greater  than  one*  then  a  selection 
Chain  exists  and  "mssel"  is  a  pointer  to  a  null  terminated 
string  of  byte  oairs.  Each  pair  in  this  string  describes 
the  set  selection  for  one  of  the  successive  set  types  in  the 
set  selection  chain.  Each  Dair  consists  of  the  index  of  the 
next  set  in  the  chain  and  the  index  of  the  data  item  which 
must  be  matched  in  the  owner  record. 

2. .   I  tem  Vec  tors  . 

Each  record  contains  a  oointer  ("ritems")  to  an 
array  of  item  descriot ion  vectors.  Each  element  of  the 
array  is  a  structure  of  tyoe  "itemvect"  and  describes  one  of 
the  fields  appearing  in  the  record.  A  field  may  be  in  a 
record  for  CALC  key  Hnkaae*  for  set  linkage  or  as  a  result 
of  a  data  sub-entry  in  the  record's  source  descriot ion.  The 
format  of  an  "itemvect"  structure  is: 
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st  rue t  i  temvec t  { 
char  *  i  name ; 
char  ilevel; 
char  i  t  ype»* 
char  *i  des ; 
Char  *  i  chec  k ; 
int  i  s  i  z  e  ; 
int  inoccJ 
i  nt  i  sbvt e? 


//  oointer  to  name  of  item 

//  item  1  eve  1 

//  type  of  data  represented 

//  data  description  pointer 

//  oointer  to  validity  check 

//  size  of  one  occurence  of  item 

//  number  of  occurences  of  item 

//  startinq  byte  within  record 

int  ( * i Drovec ) ( )  [6]  ? //  oointers  to  db  procedures 

struct  or i vector  *iovec?//. pointer  to  privacy  locks 

char  ipflags;    //  item  orivacy  f 1 aas 

> 

The  "ilevel"  entry  specifies  the  level  number  of  the 
item.  A  level  number  beween  one  and  100  indicates  the  item 
was  generated  by  an  item  sub-entry;  level  101/  a  forward 
chain  link  for  a  set;  102,  a  backward  link;  105,  a  link  to 
owner;  104,  an  owner's  link  to  first  member;  105,  an  owner's 
link  to  last  member;  106,  an  owner's  link  to  index;  and  107, 
a  CALC  synonym  link. 

"Itype"  is  the  data  type  code  for  the  item:  zero  for 
repeating  arouos;  one  for  a  PICTURE'd  character  string;  two 
for  a  PICTURE'd  numeric  string;  three  for  a  binary  integer; 
five  for  a  single  precision  floating  Doint  number;  six  for  a 
double  precision  floating  point  number;  seven  for  a  charac- 
ter string;  eight  for  a  bit  strina;  and  nine  *or  a  data  base 
key . 

If  the  item  is  a  set  link,  "ides"  is  the  set  index 
for  the  set.  If  the  item  has  a  oicture  specified,  "ides"  is 
a  pointer  to  a  character  string  containina  the  Dicture  (see 
Ref.  2,  Section  3.3.*  for  a  description  of  PICTURE'd  data). 
In  other  cases,  "ides"  is  unused. 
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"Icheck"  is  a  oointer  to  a  validity  checking 
description  for  the  item.  The  first  character  in  the 
description  is  a  flag  byte.  If  the  high  order  bit  of  the 
flag  byte  is  on,  the  oicture  is  used  as  a  check.  If  bit  six 
is  on,  a  data  base  procedure  is  used  as  a  check.  If  bit 
five  is  on,  check  values  are  used.  If  a  data  base  procedure 
is  specified,  a  pointer  to  the  procedure  is  stored  immedi- 
ately after  the  flag  character.  If  check  values  are  speci- 
fied they  are  stored  at  the  end  of  the  validity  checking 
description.  Check  values  consist  of  a  series  of  check 
entries  seperated  by  ASCII  comma  characters  and  terminated 
by  a  null  byte.  Each  check  entry  is  either  a  literal  of  the 
same  format  as  the  item  or  a  pair  of  such  literals  separated 
by  an  ASCII  dash  character. 

The  "ipflags"  are    set  when  the  sub-schema   is   vali- 
dated.  The  octal  codes  and  function  permissions  are: 
0200     Store  is  permitted 
0100     Get  is  oermitted 
040     Modify  is  permitted 

E.   Organization  for  Set  Management. 

The  schema  DBM  program  contains  an  array  of  set  vectors. 
Each  set  vector  describes  one  of  the  set  tvpes  defined  in 
the  schema  and  is  a  structure  of  tvoe  " setvect".  The  format 
of  a  "setvect"  structure  is: 
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struct  setvect  < 

i  nt  sf  1 ags  ; 
char  sowner? 
char  sf  i  tern; 
char  *smemb » 


//  see  be  1 ow 

//  owner  record  type 

//  index  of  1st  item  for  set 

//  oointer  member  description 


int  (*sprovec ) (  )  [4]  ;  //  pointers  to  db  functions 
char  spflags?    //  function  permission  flags 
char  scurown  [3]  J  //  db  kev  of  current  owner  rec 
char  *scurrec?   //  oointer  to  current  record  buf 
> 

The  value  of  "sflags"  is  formed  by  a  bit-wise  OR  of   the 

following  octal  codes: 


KNOWN  0400 

SYSTEM  0200 

DYNAMIC  0100 

PRIOR  040 

INDEXED  020 


Set  tyoe  is  known  to  sub-schema 
Si  ngul ar  set 
Dynamic  set  type 

Members  contain  backward  links 
Primary  set  order  is  via  an  index 
The  lower  four  bits  of  "sflaas"  indicate  the  order  criteria 
for  the  set:  zero*  the  order  is  immaterial;  one*  new  records 
are  inserted  on  the  front  of  the  set;  two*  new  records  are 
inserted  at  the  end  of  the  set;  three*  new  records  are 
inserted  after  the  current  record  of  the  set;  four,  new 
records  are  inserted  prior  to  the  current  record;  five 
through  11,  a  sortinq  order.  Five  indicates  sorted  by  data 
base  key;  six,  sorted  by  record  names  and  then  by  member 
keys;  seven,  sorted  by  the  member  record  kevs  with  relation- 
ship between  records  of  different  tyDes  immaterial;  and 
eight  throuah  11  indicate  sorted  by  member  keys  (this 
implies  that  the  format  of  each  member  record's  keys  is  the 
same).  The  last  four  codes  soecify  duolicate  orocessing: 
eight*  duplicates  are  allowed;  nine,  duplicates  are  first; 
ten,   duplicates   are   last;   and   11,   duplicates  are       not 
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allowed.   Items  in  the  owner  record  dealing  with  the  set  are 
assumed  to  be  stored  contigously. 

"Smemb"  points  to  a  null  terminated  string  of  bytes 
indicating  the  member  record  types  for  the  set.  The  string 
is  made  up  of  three  byte  entries.  The  first  byte  is  the 
member  record  index;  the  second  is  the  membership  vector 
index  of  the  member  record  for  this  set;  and  the  third  is 
the  item  vector  index  of  the  first  item  in  the  record  deal- 
ing with  this  set.  All  items  havina  to  do  with  the  set  are 
assumed  to  be  stored  contiguously  in  the  member  records. 

The  "spflags"  are  set  when  the  sub-schema  is  validated. 
These  flags  indicate  the  function  allowed  the  user  for  the 
set : 

01 00  Insert  is  all  owed 
040  Remove  is  allowed 
020     Fi  nd  is  all  owed 
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APPENDIX  G.   DIFFERENCES  IN  THE  SCHEMA  DDL. 


This  Appendix  gives  a  detailed  description  of  the 
differences  between  the  DDL  in  the  UNIX  DBMS  and  that 
described  in  Ref.  2.  The  Appendix  is  organized  in  parallel 
with  Section  3  of  Ref.  2  and  section  references  below  are 
sections  in  Ref.  2  unless  otherwise  noted.  The  meta- 
language used  to  describe  entries  is  identical  to  that  of 
Ref.  2  with  the  exceptions  that  no  distinction  is  made 
between  reguired  or  ootional  words  and  that  options  enclosed 
in  brackets  are  separated  by  virgules  ("/")  in  lieu  of  being 
on  separate  1 i  nes . 

A.  Words. 

The  rule  in  section  3.0.3  for  forming  words  apolies  to 
the  DDL.  However/  when  validating  a  sub-schema/  the  DBMS 
considers  upper  and  lower  case  letters  to  be  equivalent  and 
considers  underscore  ("<-")  a  synonym  for  hyphen  ("-"). 

B.  Schema  Entrv  (Section  3.1.0). 

The  "ON  [ERROR  DURING!"  clause  is  not  supported.  In  the 
"PRIVACY  LOCK"  clause/  onlv  the  "(FOR  COPY!"  option  is  sup- 
ported. Specifying  alternate  privacy  locks  for  the  same 
function  is  not  supoported. 
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C.   Area  Entry  (Section  3.2.0). 

Specifying  alternate  privacy  locks  for  the  same  function 
is  not  supported. 

0.   Record  Sub-entry  (Section  3.3.0). 

In  the  "PRIVACY  LOCK"  clause,  specifying  alternate 
privacy  locks  is  not  supported.  Specifying  a  data  base  pro- 
cedure for  area  selection  in  the  "LOCATION  MODE"  clause  is 
not  supported, 

E.   Data  Sub-entry  (Section  3.3.0). 

In  the  "TYPE"  clause,  the  only  arithmetic  types  sup- 
ported are  "BINARY  FIXED",  "BINARY  FLOAT  a"  and  "BINARY 
FLOAT  8".  The  word  "BINARY"  is  assumed  if  missing,  and  "4" 
is  assumed  if  neither  "  4  "  nor  "8"  is  specified.  If  the 
"TYPE"  clause  uses  the  "BIT  integer-3"  option,  "mteger-3" 
must  be  a  multiple  of  eight.  Since  all  records  must  be 
fixed  length,  the  "OCCURS  dat a-base- i dent i f i e r- I  TIMES" 
option  is  not  supported.  "RESULT"  and  "SOURCE"  items,  both 
virtual  and  actual,  are  not  supported.  The  "FOR 
{ENCODING/DECODING}"  clause  is  not  supported.  Specifying 
alternate  privacy  locks  on  the  same  function  is  not  sup- 
ported. 
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F.  Set  Sub-entry  (Section  "5.a.0). 

The  "TEMPORARY"  ootion  of  the  "ORDER"  clause  is  not  sup- 
ported. The  "[INDEXED  [NAME  IS  i ndex-name-1 1 ] "  clause  is 
not  supported.  Specifyinq  alternate  privacy  locks  for  the 
same  function  is  not  supported. 

G.  Member  Sub-entry  (Section  3.4.0). 

In  the  "RANGE  KEY"  clause*  no  more  than  sixteen  data- 
base-identifiers may  be  specified.  The  "DUPLICATES  NOT 
ALLOWED  FOR"  clause  is  not  supported.  No  more  than  seven 
"SEARCH  KEY"  clauses  can  be  specified.  In  the  "SEARCH  KEY" 
clause*  the  "USING"  phrase  is  not  meaningful  since  all 
search  keys  are  implemented  usinq  indices.  In  Format  1  of 
the  "SET  SELECTION"  clause,  the  "DATA-BASE-KEY  EQUAL  TO 
data-base-ident i f ier-1"  and  "CALC-KEY  EQUAL  TO  data-base- 
data-name-2  [dat a-base-dat a-name-3]  ..."  options  are  not 
supported.  In  the  same  clause*  the  only  form  of  the  "THEN 
THRU"  phrase  supoorted  is  without  the  "EQUAL  TO"  option. 
Soec i f i cat i on  of  alternate  orivacv  keys  on  the  same  function 
is  not  support  ed . 
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APPENDIX  H.  CONSTANT  FILE  CONTENTS. 


As  mentioned  in  Section  III.L.K  a  constant  file  must 
be  created  by  the  DBMS  compiler  to  allow  the  skeleton  pro- 
gram to  be  transformed  into  the  schema  DBM  for  a  particular 
data  base.  This  file  dimensions  the  tables  and  arrays  of 
the  schema  DBM  and  initializes  arrays  for  the  processinq  of 
data  base  procedures.  The  specific  tables  and  arrays  are 
described  below. 

A.   The  Character  Buffer. 

The  character  buffer,  "scharbuf",  is  utilized  for 
storage  of  character  strings  and  several  other  types  of 
variable  length  data.  This  charact-er  buffer  must  be  large 
enough  to  contain  the  schema  name;  the  Dath  names  to  all 
schema  files  (including  temoorary  ones)?  the  item  names  of 
every  item  in  every  record;  the  primary  key,  search  key  and 
selection  data  for  every  membership  vector;  the  data  and 
validity  check  descriptions  of  every  item  vector  in  every 
record;  and  the  member  record  strinq  for  every  set  vector. 
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B.  Data  Base  Procedure  Table. 

This  table  is  composed  of  two  arrays.  The  first  array* 
"orocname" *  is  an  array  of  character  strings  and  must  be 
initialized  with  the  name  of  every  data  base  procedure  men- 
tioned in  the  schema.  The  second  array?  "orocoo i nt " *  is  an 
array  of  function  pointers  which  must  be  initialized  to 
point  to  the  data  base  procedures  listed  in  "procname".  The 
references  to  data  base  procedures  in  "orocooint"  cause  the 
C  compiler  to  load  these  functions  into  the  schema  DBM. 

C.  Privacy  Vector  Array. 

This  arrav  of  structures  of  type  "privector"  must  be 
dimensioned  large  enouah  to  hold  the  maximum  number  of  item 
privacy  locks  defined  in  any  one  record  entry.  The  format 
of  a  "privector"  structure  is  described  in  Appendix  F,  Sec- 
tion A  .  3 . 

D.  Record  Buffer  Array. 

This  is  a  character  array  which  is  used  to  provide 
record  buffers  for  the  various  areas*  records  and  sets.  Its 
dimension  must  be  the  number  of  areas  and  sets  times  the 
maximum  record  size  plus  the  size  of  each  individual  record. 
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E.  Area  Vector  Array. 

This  array  of  structures  of  type  "areavect"  must  be 
dimensioned  larqe  enough  to  provide  one  area  vector  for  each 
defined  area.  An  "areavect"  structure  is  described  in 
Appendix  F,  Section  8. 

F.  Record  Vector  Array. 

This  array  of  structures  of  tyoe  "recvector"  must  be 
dimensioned  large  enough  to  orovide  one  record  vector  for 
each  defined  record  type.  A  descriotion  of  the  "recvector" 
structure  is  contained  in  Appendix  F,  Section  D. 

G.  Member  Vector  Array. 

This  array  of  structures  of  tyoe  "member"  must  be  dimen- 
sioned large  enough  to  provide  one  member  vector  for  every 
record  membership  defined  in  everv  set.  A  descriotion  of 
the  "member"  structure  is  contained  in  Appendix  F,  Section 
D.l. 

H.   Item  Vector  Array. 

This  array  of  structures  of  tvoe  "itemvect"  must  be 
dimensioned  larae  enouah  to  provide  an  item  vector  for  every 
item  in  every  record  tyoe.  A  description  of  the  "itemvect" 
structure  is  contained  in  Aooendix  F,  Section  D.2. 
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I.   Set  Vector  Array. 

This  arrav  of   structures   of   type   "setvect"  must   be 

dimensioned   large   enough  to  provide  a  set  vector  for  every 

set  defined  in  the  schema.   A  description  of   the  "setvect" 
structure  is  contain  in  Appendix  F,  Section  E. 
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